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The traditional assumption about memory is that a read returns the value written by the most 
recent write. However, in a shared memory multiprocessor several processes independently and 
simultaneously submit reads and writes resulting in a partial order of memory operations. In 
this partial order, the definition of most recent write may be ambiguous. Memory consistency 
models have been developed to specify what values may be returned by a read given that memory 
operations may only be partially ordered. Before this work, consistency models were defined 
independently. Each model followed a set of rules which was separate from the rules of every 
other model. In our work we have defined a set of four consistency properties. Any subset of the 
four properties yields a set of rules which constitute a consistency model. Every consistency model 
previously described in the literature can be defined based on our four properties. Therefore, we 
present these properties as a unfied theory of shared memory consistency. 

Our unified theory provides several benefits. First, we claim that these four properties capture 
the underlying structure of memory consistency. That is, the goal of memory consistency is to 
ensure certain declarative properties which can be intuitively understood by a programmer, and 
hence allow him or her to write a correct program. Our unified theory provides a uniform, formal 
definition of all previously described consistency models, and in addition some combinations of 
properties produce new models that have not yet been described. We believe these new models will 
prove to be useful because they are based on declarative properties which programmers desire to 
be enforced. Finally, we introduce the idea of selecting a consistency model as an on-line activity. 
Before our work, a shared memory program would run start to finish under a single consistency 
model. Our unified theory allows the consistency model to change as the program runs while 
maintaining a consistent definition of what values may be returned by each read. 
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1. INTRODUCTION 

Shared memory is a powerful abstraction for interprocess communication. The 
concept of shared memory originated from multiprogramming on uniprocessors and 
bus-based multiprocessors. In these environments there is a simple model of the 
memory system enforced in hardware. The model can be stated as: 

— There is a physical memory cell that represents each variable. The state of this 
memory cell is the state of the variable. 

— Memory operations take place sequentially. They are atomic and there is a total 
order on all memory operations. Read operations return the current state of the 
physical memory cell. Write operations change the current state of the physical 
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memory cell and the change becomes observable to all processes simultaneously. 
— The operations of each process take place in the order specified by its program. 

These conditions are enforced by the hardware architecture. In a multipro- 
grammed uniprocessor there really is only one process submitting memory opera- 
tions at a time. In a bus-based multiprocessor with no cache, the bus serves as a 
serialization mechanism that allows operations to reach memory sequentially. 

For many years these assumptions were implicit and any computer scientist 
would tell you, "That's just how memory works." Then two things happened. 
The first is that memory systems in multiprocessors got more and more compli- 
cated [Dubois and Scheurich 1990; Dubois et al. 1986; Gharachorloo et al. 1990; 
Lenoski et al. 1990]. The second is the invention of distributed shared memory 
(DSM) for the message-passing multicomputer [Amza et al. 1996; Bennett et al. 
1990; 1995; Bershad et al. 1993; Bershad and Zekauskas 1991; Li 1986; Li and 
Hudak 1989]. Caching and out-of-order instruction dispatching can pose a problem 
for multiprocessors. The hardware of each processor enforces the restriction that 
the processor sees its own memory operations in the order specified by its program, 
but this does not automatically protect processors from seeing each other's oper- 
ations out of order. DSM provides the illusion of a shared address space on top 
of hardware that only supports message passing. In DSM systems, asynchronous 
messages and replicated copies of data can cause the same problems. 

These problems led to the concept of consistency models. A consistency model 
is a specification of the allowable behavior of memory. It can be seen as a contract 
between the memory implementation and the program utilizing memory [Tanen- 
baum 1995]. The input to memory is a set of memory operations (reads and writes) 
partially ordered by program order. The output of memory is the collection of val- 
ues returned by all read operations. A consistency model is a function that maps 
each input to a set of allowable outputs. The memory implementation guarantees 
that for any input it will produce some output from the set of allowable outputs 
specified by the consistency model. The program must be written to work cor- 
rectly for any output allowed by the consistency model. This idea was originally 
described by Lamport when he defined sequential consistency [Lamport 1979]. A 
sequentially consistent multiprocessor allows conventional reasoning about the cor- 
rectness of programs. Essentially, it allows the programmer to treat the machine 
as a multiprogrammed uniprocessor. Enforcing sequential consistency can be very 
costly. Soon weaker consistency models were discovered that were less expensive in 
terms of communication. Multiprocessors were generally used for large numerical 
programs that were already programmed with a constrained programming style to 
avoid data race conditions. With slight modifications to the programming style, 
algorithms could still be written to execute correctly for non-sequentially consistent 
memory systems. 

With consistency models, the concept of shared memory is no longer tied to the 
physical implementation of memory cells. A programmer can write a correct pro- 
gram using the abstractions of concurrent processes and shared memory with little 
knowledge about the underlying implementation that will eventually execute the 
program. All that the programmer needs to know is the consistency model enforced 
by memory. To give the memory implementor more flexibility for optimization, the 
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memory might enforce fewer guarantees. Or to make the programmer's job eas- 
ier the memory might enforce more guarantees. Many choices have been made 
along this ease of use to efficient implementation continuum. The results are the 
consistency models described in the literature [Ahamad ct al. 1991; Bcrshad and 
Zekauskas 1991; Dubois et al. 1986; Gao and Sarkar 2000; Gharachorloo et al. 
1990; Goodman 1989; Herlihy and Wing 1990; Hutto and Ahamad 1990; Iftode 
et al. 1996; Keleher et al. 1992; Lamport 1979; Lipton and Sandberg 1988] 

This leads to the idea of shared memory as an application programming inter- 
face (API) as shown in Figure 1. The program and memory agree on a consistency 
model. Then the program executes using the shared memory API, and the pro- 
gram's processes share information in a common address space. No knowledge is 
needed of the memory implementation. 

This work also introduces the idea of on-line consistency model transitions. Prior 
to this research, the selection of a consistency model was seen as an off-line activity. 
A program would bo written to operate under a particular consistency model, and 
it would be up to the user to run the program on a system which supported that 
consistency model. Instead, with consistency model transitions a program is allowed 
to select and change the consistency model at run-time. The consistency model 
becomes a tunable parameter to the shared memory API. This allows a program 
to select different consistency models for different phases of a computation. This 
requires that consistency models be extended with a transition theory to specify 
the allowed behavior of the memory system when processing pending operations 
submitted under more than one consistency model. 

One hypothesis of our work was that every consistency model is composed of 
various consistency properties, system-wide conditions that must be enforced, and 
that these properties can be combined in arbitrary ways to produce a lattice of 
consistency models. By defining every consistency model as a set of primitive prop- 
erties, transitions between models can be described as the addition or removal of 
various properties. For evaluation and validation, the new properties proposed in 
this paper are compared against existing definitions of consistency models. Exist- 
ing consistency models fall into two classes, Either Non-synchronized or Synchro- 
nized models. Non-synchronized models have uniform consistency restrictions for 
all operations. Synchronized models have special operations (called synchroniza- 
tion operations) which have greater consistency restrictions than other operations. 
Non-synchronized consistency models from the literature are simulated by combi- 
nations of properties in the lattice. Synchronized models have two distinct types of 
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operations that have different consistency requirements. Therefore, synchronized 
consistency models are simulated by consistency transitions 

The first contribution of this work is the discovery of four fundamental consis- 
tency properties: process order, data order, write-read-write order, and anti order. 
These properties provide alternate definitions of well known non-synchronized con- 
sistency models and reveal a fundamental structure behind the models. Every 
non-synchronized model described in the literature can be formally described by 
some combination of these properties. The second contribution of this work is the 
concept of a consistency lattice. In the lattice, each pair of models has a unique 
least upper bound and a unique greatest lower bound. These define the minimum 
model required to enforce all conditions of both models, and the maximum set of 
conditions enforced by both models respectively. This lattice allows simple, direct 
comparison of models, and is a valuable resource for any application environment 
that uses more than one consistency model. The third contribution of this work 
is the new consistency models revealed by the structure of the lattice. Generating 
every possible combination of properties produces five combinations that are well 
defined consistency models that have not previously been discovered. The fourth 
contribution of this work is a transition theory that can be used to simulate well 
known synchronized consistency models. 

FIXME Insert roadmap here. 

2. RELATED WORK 
2.1 Shared Memory 

A common trend in the literature is the development of uniform frameworks and 

notation to represent several previously defined consistency models [Adve and Hill 
1993; Adve and Gharachorloo 1996; Bataller and Bernabeu 1997; Mosberger 1993]. 
Our unified theory is an improvement over these methods because we expose the 
underlying structure of declarative properties enforced by various models, and we 
predict new models that have not yet been discovered. There are currently two 
common methods of characterizing consistency models. One method is to describe 
restrictions on the way in which processes arc allowed to issue memory operations 
which we will call the "issue" method (e.g. see [Adve and Gharachorloo 1996].) 
Another method is to describe restrictions on the apparent order of events visible 
to processes which wc will call the "view" method (e.g. see [Bataller and Bernabeu 
1997].) Adve and Gharachorloo [Adve and Gharachorloo 1996] use the "issue" 
method of defining consistency models. They identify two conditions that together 
will enforce sequential consistency. They call these the process order property, and 
the write atomicity property. 

Process order property. Program order must be maintained among operations 
from individual processes. 

Write atomicity property. In cache based systems with multiple copies of a mem- 
ory location, writes must be atomic. 

The first condition can be enforced by having a process not issue an operation un- 
til all previous operations are complete. Complete means that a read has returned 
its value, or a write has been applied and acknowledged. The second condition can 
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be enforced by a cache coherence protocol which does not acknowledge writes until 
every copy is updated or invalidated. Adve and Gharachorloo use this implemen- 
tation of sequential consistency as a basis for their definitions of other consistency 
models. Every other model is allowed to violate some of the restrictions required 
for sequential consistency. Violating a restriction allows for optimization in the 
implementation. They identify five optimizations that may be allowed. 

— Allow a read to be issued before a previous write is complete. 

— Allow a write to be issued before a previous write is complete. 

— Allow a read or write to be issued before a previous read is complete. 

— Allow a read to view another process' write before the write is applied everywhere. 

— Allow a read to view one's own write before the write is applied everywhere. 

The first optimization combined with the last two result in processor consistency 
as it was defined for the DASH multiprocessor [Lenoski et al. 1990]. All five opti- 
mizations combined result in slow consistency [Hutto and Ahamad 1990] which is 
used for non-synchronizing operations in synchronized consistency models such as 
release and weak consistency. For each consistency model, Advo and Gharachorloo 
describe a "safety net" which would enforce sequential consistency on top of that 
model. These safety nets consist of replacing certain operations with special pur- 
pose synchronization operations such as test and set or acquire/release. They also 
describe the concept of a programmer centric framework where for any consistency 
model a programmer can determine what synchronizations must be performed for 
a program to simulate sequential consistency on top of that model. 

The goal of consistency models in this view is to simulate sequential consistency 
with an efficient implementation. The tradeoff is speed versus complexity exposed 
to the programmer. Their work does not characterize the order of events as seen 
by any particular process in a non-sequential execution. Instead, they character- 
ize what changes a programmer must make to a program to simulate sequential 
consistency. Other work taking this view has been done to present an efficient, 
sequentially consistent interface to the programmer through instruction level par- 
allelism and speculative execution [Gniady et al. 1999; Ranganathan et al. 1997; 
Ranganathan et al. 1997]. The logic being that speculative rollback will gener- 
ally occur in situations where the processes would be waiting on synchronization 
operations anyway so little time would actually be lost. 

We believe that using weaker consistency models soley to simulate sequential 
consistency with an efficient implementation should not be the only goal of shared 
memory research. Our work is based on the idea of declarative consistency proper- 
ties weaker than sequential consistency, but still intuitively useful to programmers. 
Therefore, we found the formalisms of the "issue" method less useful to us. 

An alternative to the "issue" method is the "view" method where each process has 
a view of the order of events in the system. For example, PRAM consistency [Lipton 
and Sandberg 1988] states that each process must see all operations to occur in 
an order that respects program order, but different processes may see different 
orders. This essentially places restrictions on when operations may become visible 
to other processes, and not on when they may be issued. For our purposes, the 
view method of defining consistency models is most appropriate. What matters is 
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the possible orders of events from the process' (programmer's) point of view. The 
programmer does not care how the shared memory is implemented. If two different 
implementations produce the same set of possible views they should be considered 
equivalent. For this reason, our work uses view based definitions of consistency 
models. We believe they are more independent of implementation details. Several 
surveys of view based definitions have been presented in the literature [Dataller and 
Bernabeu 1997; Mosberger 1993; Tanenbaum 1995]. These view based definitions 
are presented in Subsections 2.2 and 2.3. 

The only prior comparison in the literature of the issue and view methods is 
by Mustaque Ahamad, et. al. [Ahamad et al. 1992]. In their paper they com- 
pare Goodman's definition of processor consistency (which is view based) to the 
DASH definition (which is issue based.) Their conclusion was that both definitions 
are weaker than sequential consistency, and stronger than both PRAM and cache 
consistency. This is the strength relationship commonly understood for processor 
consistency, and the two models have often been considered equivalent. However, 
Ahamad, et. al. showed that the two definitions are not equivalent, and are in fact 
incomparable. This showed that it is not trivial to compare consistency models de- 
fined under the two formalisms. More work relating the two formalisms is needed. 
However, this paper concentrates on view based definitions. Generally, issue based 
definitions have a view based definition that is analogous. 

The most closely related work to this paper is the Mume project [Bataller and 
Bernabeu- Auban 1998] which specifies three consistency properties (orderings): to- 
tal order, total order with mutual exclusion, and causal order. The Mume project 
showed that these orderings can be used to provide an alternative and equivalent 
specification of existing consistency models. However, unlike our work, there is no 
notion of combining properties in arbitrary ways to produce a lattice of consistency 
models, or of consistency transitions within that lattice. 

2.2 Consistency Model Definitions 

Leslie Lamport defined sequential consistency [Lamport 1979]: 

Definition 2.1. A multiprocessor is Sequentially Consistent if the result of any 
execution is the same as if the operations of all the processors were executed in 
some sequential order, and the operations of each individual processor appear in 
this sequence in the order specified by its program. 

Lamport also gave two implementation requirements which, if met, would enforce 
sequential consistency. 

Rl. Each processor issues memory requests in the order specified by its program. 

R2. Memory requests from all processors issued to an individual memory module 
are serviced from a single FIFO queue. Issuing a memory request consists of entering 
the request on this queue. 

Linearizability [Herlihy and Wing 1990] also called atomic memory [Lamport 
1986] is essentially sequential consistency with a real-time constraint. Each opera- 
tion is given a begin time and end time in reference to a global Newtonian clock. 
For an execution to be linearizable, it must be sequentially consistent, and the 

Journal of the ACM, Vol. V, No. N, Month 20YY. 



A Unified Theory of Shared Memory Consistency 



7 



Pi 



{w,pi,x, 



1) 



{r,pi,y,±) 



P2 



{w,p2,y, 2) 



{r,p2,x,±) 



Fig. 2. An execution that is processor, but not sequentially consistent. 



sequential total order must correspond to an order realizable by placing each oper- 
ation at a single point in time between its begin and end times. Essentially, if two 
operations' time spans do not overlap they cannot be re-ordered even in the absence 
of any other dependency. Even though linearizability is stronger, sequential consis- 
tency is the strongest consistency model used in practice [Advc and Gharachorloo 
1996; Tanenbaum 1995]. Sequential consistency is considered strong enough for 
conventional reasoning about the correctness of shared memory programs. 

Lipton and Sandberg defined PRAM (Pipelined RAM) consistency [Lipton and 
Sandberg 1988], and Goodman defined cache consistency [Goodman 1989]: 

Definition 2.2. A multiprocessor is PRAM Consistent if writes performed by a 
single process are seen by all other processes in the order in which they were issued, 
but writes from different processes may be seen in different orders by different 

processes. 

Definition 2.3. A multiprocessor is Cache Consistent if all writes to the same 
memory location are performed in some sequential order. 

In the same paper Goodman defined processor consistency. 

Definition 2.4. A multiprocessor is Processor Consistent if it is PRAM consis- 
tent and writes to the same memory location are seen in the same sequential order 
by all processes. 

One consistency model is said to be stronger than another if every condition re- 
quired by the weaker model is also required by the stronger one. Thus, a stronger 
consistency model has a more highly constrained behavior than a weaker one. By 
considering the definitions, note that sequential consistency is strictly stronger than 
processor consistency which is strictly stronger than both PRAM and cache con- 
sistency. However, PRAM and cache consistency are incomparable. PRAM and 
cache consistency are very similar to Lamport's conditions Rl and R2, enforcing Rl 
and R2 enforces sequential consistency, processor consistency enforces PRAM con- 
sistency and cache consistency, but processor consistency is weaker than sequential 
consistency. How can this be? 

Consider Figure 2. In this figure time proceeds from left to right, and variables 
are assumed to have an initial value of _L. Process pi writes to x, and then reads 
from y. Likewise, process p2 writes to y, and then reads from x. Both processes 
read the initial value of the variable instead of each other's write. Both processes 
perceive that their write went first so the execution is not sequential. However, it 
is processor consistent. There is only one write by pi and one by p2 so it is trivially 
PRAM consistent. There is only one write to x and one write to y so it is trivially 
cache consistent. This example demonstrates how processor consistency is weaker 
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than sequential consistency. Writes by different processes to different variables may 
be seen to occur in different orders. 

The question remains, does the execution in Figure 2 satisfy Rl and R2? The 
answer is no because R2 requires that read operations be placed in the queue along 
with write operations. Neither process can place its read operation in the queue 
until its write operation has been placed in the queue so at least one of the processes 
must read the other's write. On the other hand, processor consistency only requires 
that write operations become visible in the correct order. The write operations can 
be pending while each process does its read, and then the write operations are 
applied in the correct order. 

Causal memory [Ahamad et al. 1991] is a consistency model drawn from Lam- 
port's concept of potential causality [Lamport 1978]. Causal memory is weaker 
than sequential consistency, stronger than PRAM, and incomparable to processor 
and cache consistency. It was defined by Ahamad, et. al. as: 

Definition 2.5. A multiprocessor is Causally Consistent if for each process the 
operations of that process plus all writes known to that process appear to that 
process to occur in a total order that respects potential causality. Potential causality 
is as defined by Lamport [Lamport 1978] with writes interpreted as sends and reads 
interpreted as receives. 

Slow consistency [Hutto and Ahamad 1990] is weaker than both PRAM and cache 

consistency. 

Definition 2.6. A multiprocessor is slow consistent if reads must return some 
value that has been previously written to the location being read. Once a value has 
been read, no earlier writes to that location (by the processor that wrote the value 
read) can be returned. Writes by a process must be immediately visible to itself. 

Local consistency [Bataller and Bernabeu 1997] refers to the weakest consistency 
model for shared memory. 

Definition 2.7. A multiprocessor is Locally Consistent if each process' own oper- 
ations appear to occur in the order specified by its program. There is no restriction 
on the order in which writes by other processes appear to occur, and different 
processes may see different orders. 

It is important to note that every consistency model is stronger than local con- 
sistency and weaker than sequential consistency which is weaker than linearizabil- 
ity [Herlihy and Wing 1990]. This fact implies that consistency models could be 
placed in a lattice. 

2.3 Synchronized Consistency Models 

Some consistency models include explicit synchronization actions which are treated 
differently than ordinary memory operations. Synchronization operations are pro- 
cessed at a high level of consistency, usually sequential consistency. Ordinary op- 
erations are processed at a low level of consistency, usually slow consistency, but 
the presence of synchronization operations places additional ordering restrictions 
on ordinary operations. Dubois, et. al. defined weak consistency [Dubois et al. 
1986]. 
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Definition 2.8. A multiprocessor is Weak Consistent if: 

(1) Accesses to global synchronizing variables are strongly ordered [sequentially 
consistent] . 

(2) No access to a synchronizing variable is issued in a processor before all previous 

global data accesses have been performed. 

(3) No access to global data is issued by a processor before a previous access to a 
synchronizing variable has been performed. 

An ordinary operation is issued either before or after a synchronization operation. 
All processes must sec the ordinary operation occur in this ordc;r with respect to the 
synchronization operation. This provides a sufficient programming environment for 
constructs such as critical sections and barriers. For example, a barrier is defined 
to be a synchronization operation, and all operations issued before the barrier must 
appear to occur before the barrier. However, this condition is sometimes stronger 
than necessary. Synchronizing operations can be used just to import information, 
as with the acquiring of a lock, or just to export information, as with the release 
of a lock. Taking advantage of this as an opportunity for optimization leads to a 
different consistency model called release consistency [Gharachorloo et al. 1990] . 

Definition 2.9. A multiprocessor is Release Consistent if: 

(1) Before an ordinary LOAD or STORE access is allowed to perform with respect 
to any other processor, all previous acquire accesses must be performed. 

(2) Before a release access is allowed to perform with respect to any other processor, 
all previous ordinary LOAD and STORE accesses must be performed. 

(3) Special accesses [including acquire and release] are sequentially consistent with 
respect to one another. 

A process performs an acquire to get up to date information. Only that pro- 
cess is guaranteed to be up to date, and then only up to the point of the latest 
release on every other process. A different implementation called lazy release consis- 
tency [Keleher et al. 1992] enforces the same consistency model, but sends updates 
as late as possible. The distinction between release and weak consistency is that 
release forces the program to give more detailed instructions on what must be up to 
date at a synchronization. This trend is continued with entry consistency [Bershad 
and Zekauskas 1991] and scope consistency [Iftode et al. 1996]. In entry consis- 
tency [Bershad and Zekauskas 1991] each synchronization variable is associated 
with one or more ordinary variables. Acquires and releases only bring up to date 
those ordinary variables associated with a particular synchronization variable. In 
scope consistency [Iftode et al. 1996] this set of variables is not static, but rather 
any ordinary variables accessed between an acquire and release of a synchronization 
variable must be brought up to date to the point of the release on all subsequent 
acquires of the same synchronization variable. 

A final synchronized model called location consistency [Gao and Sarkar 2000] is 
significantly different, location consistency is similar to entry consistency in that 
each ordinary variable is associated with a synchronization variable, and a release 
or acquire is ordered with an ordinary operation if their variables are associated. 
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Fig. 3. An execution that is location, but not entry consistent. 



However, location consistency is different in that it allows the state of a variable to 
be a partial order, and not a total order. 

For example, in Figure 3 two processes both write to the variable x. In entry 
consistency, the order of these two writes is undefined. They could be seen to 
occur in either order, and two different processes do not have to agree on the order. 
However, there is an implicit assumption that for a single process the two operations 
occur in some order, and the second one overwrites the first. So, when p2 reads 1 
from X one can deduce that the order seen by P2 is: 

{w,p2,x,2) < {w,pi,x,l) < {r,p2,x,l) 

Therefore, p2 will never again read from {'W,p2, x, 2) because it has been overwrit- 
ten. The operation {r,p2,x,2) violates entry consistency, but not location consis- 
tency. Location consistency assumes that each process sees a partial order of writes, 
and any read can return the value of any write that is not dominated by another 
write. Writes are only ordered when they are by the same process, or when they 
are separated by a release- acquire pair. Therefore, under location consistency P2 
can continue forever alternately reading the values 1 and 2 from x barring further 
write, acquire, or release operations. The purpose of location consistency is that 
if a program separates every pair of competing writes with a release-acquire pair 
(called a data-race-free program) then it is equivalent to entry consistency, but still 
might be able to take advantage of the location model for efficiency optimizations. 

3. A FORMALISM FOR SHARED MEMORY CONSISTENCY MODELS 

This section presents formal, declarative definitions of the well known consistency 
models introduced in Section 2. When a shared memory system satisfies a particular 
consistency model it must produce only executions acceptable to that model. In 
this way, a consistency model can be thought of as a criteria to accept or reject 
program executions. Therefore, a model can be defined by specifying its set of 
accepted executions. This is the technique we will use in the rest of the paper. 

In "view" based definitions of consistency models, memory operations must ap- 
pear to be processed in a certain order. For example, under sequential consistency, 
there must appear to be a single total order on all operations. Under Cache con- 
sistency, there must appear to be a total order on the operations to each variable. 
Each process sees, through its read operations, a particular order of events in the 
memory system. However, each process has limited information because it may 
not read every write. Therefore, there could be many orders of events that would 
be consistent with the values returned by a process' reads. If any of these orders 
satisfies a consistency model then the process cannot prove that the memory sys- 
tem violated that model. If some acceptable order exists for every process then the 
execution must be accepted. The formalism used in this section is defined in the 
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Fig. 4. Examples for PRAM and Cache Consistency 



appendix and is taken from [Ahamad et al. 1992; Bataller and Bernabeu 1997]. 

Theorem 3.1. An execution is Sequentially Consistent iff 

3 SerialView{<po) 

For proof see [Bataller and Bernabeu 1997] . 

This restatement of sequential consistency corresponds very closely to the original 
definition of sequential consistency. There exists a serial view (total order) on all 
operations that respects <po (the process order of every process.) The actual 
execution may not have occurred in this order, but the values returned by the reads 
are exactly the same as the values that would have been returned had this been the 
execution order. Therefore, no process external to the memory system can prove 
that the execution did not actually happen in this order. In Figure 21(a) the given 
total order qualifies as the serial view proving that the execution is sequentially 
consistent. In Figure 21(b) it is easy to see that no such view could be constructed. 

Theorem 3.2. An execution is PRAM Consistent iff 

VigpEI SerialView{<po |(*, i, *, *) U (w, *, *, *)) 

For proof sec [Bataller and Bernabeu 1997]. 

PRAM consistency requires that each process see a view that is consistent with 
the process order for all processes, but not all processes must see the same view. 
The operations visible to each process arc its own reads and all writes. For this 
reason the view of process i is restricted to (*,i, *,*), all of its own operations, 
and {w, *,*,*), all writes. If a serial view conforming to process order can be 
constructed for this subset of operations then this process cannot argue that the 
memory system has violated PRAM. If such a view can be constructed for every 
process then no external observer can argue that the memory system has violated 
PRAM. 

Theorem 3.3. An execution is Cache Consistent iff 

Va;£y3 SerialView{<po |(*, *,a;, *)) 

For proof see [Bataller and Bernabeu 1997] . 

Cache consistency requires that for the operations on each variable, x, there is 
a serial view that respects process order. The views that must be constructed to 
satisfy the above definition are exactly the total orders required for the original 
definition. 
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Consider Figure 4. The sets P, V, and O and the initial writes can usuaUy be 
deduced from the descriptions of process order and writes-to order. For this reason 
they will be omitted in this and further examples unless required for clarity. In 
Figure 4(a), both processes write and then read x, and both read the other's write. 
This execution can be shown to be PRAM consistent by the following views. 

pi : {w,pi,x,l) <p, {w,p2,x,2) <p^ {r,pi,x,2) 
P2 : {w,p2,x,2) {w,pi,x,l) {r,p2,x,l) 

This execution is not sequential. One would have to add (r,p2,x, 1) to <pi, or 
{r,pi,x, 2) to <p2. In <p^, {r,p2,x, 1) cannot come before {w,p2, x, 2) because that 
would violate process order. It also cannot come after {'W,p2,x,2) because then it 
would be after, but not reading from, {w,p2,x,2) which would violate the serial 
property. A similar argument can be made for <p^ . No single view can satisfy both 
processes so the execution is not sequentially consistent. 

In Figure 4(b) process 1 writes to both x and y while process 2 reads both x and 
y. Process 2 reads process I's second write to y and the initial value of x. This 
execution can be shown to be Cache consistent by the following views. Note, the 
initial writes must be accounted for in all views, but are omitted in examples where 
their placement is trivial, (w, e, x, 1.) is shown in <j; because it's value is later read. 

X : {w,e,x,±) <x {r,p2,x,±) <x {w,pi,x, 1) 
y ■■ {w,pi,y,2) <y {r,p2,y,2) 

Figure 4(b) is not sequentially consistent. In a view with every operation, 
{w,pi,x,l) would have to come before {w,pi,y,2) by process order. (w,pi,?y, 2) 
would have to come before (r,p2,y,2) for the view to be serial. {r,p2,y,2) would 
have to come before {r,p2,x,±) by process order. This implies {r,p2,x,±) would 
come after (w,pi,a;, 1) but read from the initial write so the view could not be 
serial. 

Also, 4(a) is not Cache consistent, and 4(b) is not PRAM consistent. In 4(a) 
all operations are on the same variable so there would need to be a serial view 
on all operations. In disproving sequential consistency we have already shown this 
is impossible. For 4(b) to be PRAM consistent the view would need to be 
constructed containing all of pi's writes, and all of p2^s operations. This would 
include all of the operations which have likewise been shown to be impossible. 

Theorem 3.4. An execuUon a is Processor Consistent iff 

Va;£v3 <x=SerialView{<po |(*, *, a;, *)), and 

Vi£p3 SerialView{{\Jxev <x) U <po |(*, «,*,*) U (w, *)) 

For proof see [Dataller and Bernabeu 1997] . 

This restatement says that Processor consistency requires PRAM and cache con- 
sistency. It also requires that the PRAM and cache views be mutually consistent. 
The views that satisfy PRAM must conform not only to the process order, but to 
the view order of every variable enforced by cache consistency. This is equivalent 
to Goodman's definition of processor consistency. 

Definition 3.5. The Causal Relation, <cr, 
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Voi.oj-eo Oi <CR Oj iff 
Oi <po Oj, or 

Oi ^ Oj, or 

3 Ofe e O such that Oi <cr Ok <CR Oj 
Theorem 3.6. An execution a is Causally Consistent iff 
Vigp3 SerialView{<cR |(*, «, *, *) U (w, *, *, *)) 
For proof see [Bataller and Bernabeu 1997]. 
Theorem 3.7. An execution a is Slow Consistent ijf 
Viep,2;ey3 SerialView{<po |(*, i, x, *) U (w;, *, x, *)) 
For proof see [Bataller and Bernabeu 1997]. 
Theorem 3.8. An execution a is Locally Consistent iff 
Viep3 SerialView{<iLocai |(*, i, *, *) U (w, *, *, *)) 
For proof see [Bataller and Bernabeu 1997]. 
3.1 Synchronized Consistency Models 

Synchronized consistency models require additional definitions. First of all, op- 
erations are divided into two types, ordinary and synchronization operations. In 
some models such as weak consistency, reads and writes are merely designated as 
synchronization operations. In other models such as release consistency, synchro- 
nization operations are now types of operations, acquire and release. In either case, 
the operation type s is used to designate synchronization operations. For example, 
(s, *,*,*) designates the set of all synchronization operations whether those are 
read, write, acquire, or release. Also, we need to explicitly state that the writes-to 
relation is defined on synchronization operations. For this purpose, acquires are 
treated as reads, and releases are treated as writes. Essentially, synchronization 
operations must be aware of which acquire corresponds to which release. Defining 
the writes-to relation in this way allows the existing definition of serial view to be 
used for this purpose. Finally, for each synchronized consistency model, certain 
ordinary operations must come before or after certain synchronization operations. 

Definition 3.9. D-{s) denotes the set of ordinary operations that must come 
before synchronization operation s. D^{s) denotes the set of ordinary operations 
that must come after synchronization operation s. <d denotes the relation: 

Synchronized consistency models support different consistency for ordinary oper- 
ations than synchronization operations. For some models, ordinary operations are 
processed under slow consistency, and for some models under cache consistency. 
The authors of [Bataller and Bernabeu 1997] argue that this distinction is not a 
significant design feature, but rather was primarily an artifact of the implementa- 
tion for which each model was originally defined. They present formal definitions of 
all models assuming that ordinary operations are processed under slow consistency. 
Synchronization operations are generally processed under sequential consistency, 
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although a variation of release consistency was presented where synchronization 
operations were processed under processor consistency. Below, we assume synchro- 
nization operations obey scqcntial consistency and ordinary operations obey slow 
consistency. Variations will be dealt with in the section on consistency transitions 
(see Section 5.) 

Every synchronized consistency model obeys the following condition. The only 
difference between models is in the definition of D_(s) and D+(s). 

Definition 3.10. For a given definition of <£), an execution is synchronized model 
consistent iff 

3 <seg=SerialView(<po |(s, *,*,*)), and 
<s=the transitive closure of <d U <seg) and 
Vi£p,xey3 SerialView(<s U <po \{*,i,x,*) U {w,*,x,*)) 

Definition 3.10 says that a sequential order exists on all synchronization opera- 
tions. The per-process, per- variable views required by slow consistency exist. And 
the slow consistent views respect the transitive closure of the ordering < d and the 
sequential order of synchronization operations. We will now discuss the differences 
between various consistency models. 

In weak consistency [Dubois et al. 1986] there is only one synchronizing variable, 
and there is no distinction between acquire and release types of synchronizing oper- 
ations. D+{s) orders after s any operation ordered after it by process order. -D-(s) 
orders before s any operation ordered before it by process order. 

For release consistency [Gharachorloo et al. 1990] there is only one synchronizing 
variable, but the distinction is made between acquire and release types of synchro- 
nizing operations. D^{acquire) orders after acquire any operation ordered after it 
by process order. D -{release) orders before release any operation ordered before 
it by process order. 

Lazy release consistency [Keleher et al. 1992] does not force operations before 
a release to be ordered before that release, but they must be ordered before any 
subsequent acquire. There is only one synchronizing variable. D+{acquire) orders 
after acquire any operation ordered after it by process order. £)_ (acquire) orders 
before acquire any ordinary operation where there exists release <s acquire such 
that the ordinary operation is ordered before release by process order. No ordinary 
operations are directly ordered with any release. 

In entry consistency [Bershad and Zekauskas 1991] there can be more than one 
synchronization variable. Each ordinary variable is associated with a synchroniza- 
tion variable. An ordinary operation is ordered with a synchronization operation 
in the same way it would by release consistency if and only if their variables are 
associated. 

In scope consistency [Iftode et al. 1996] there can be more than one synchroniza- 
tion variable. An ordinary operation is ordered with a synchronization operation 
in the same way it would be by release consistency if and only if there is no other 
synchronization operation to the same variable ordered between them by process 
order. Essentially, ordinary operations are only ordered with respect to the most 
recent acquire and the next release to each synchronization variable. 
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Location consistency [Gao and Sarkar 2000] is different, but we will present it in 
a formalism as close as possible to that used for the other models. One important 
difference is that in location consistency synchronization operations arc defined to 
provide a mutual exclusion function. If one process performs an acquire, then no 
other process may successfully perform an acquire until the first process performs a 
release. All subsequent acquires arc ordered after that release. This provides control 
dependencies in addition to the data dependencies enforced by the consistency 
model. This exposes a fundamental difference of opinion about what the job of 
a consistency model should be. For example, under release consistency there is 
nothing to prevent two processes from both performing acquires and concurrently 
writing to the same variable. Release consistency specifies formally what data 
dependencies must be preserved by the memory system in that situation, i.e. the 
writes are unordered and can be seen in different orders by different processes. If 
the program truly needs mutual exclusion it can be included in the program code as 
a locking algorithm that works correctly under release consistency [Gharachorloo 
et al. 1990]. 

Most synchronized consistency models were written in two parts, the consistency 
model itself, and a programming paradigm such as properly labeled [Gharachorloo 
et al. 1990] or data-race-free [Adve and Hill 1993] programs. The guarantee pro- 
vided is that a program that obeys the programming paradigm executed on the 
consistency model will simulate sequential consistency. The authors of release con- 
sistency expected that it would be used in conjunction with control flow constructs 
in the program to simulate sequential consistency, but they did not directly embed 
the control flow into the consistency model. Instead they allowed the programmer 
to choose the appropriate control flow constructs. They also acknowledged that 
some programmers may not want to simulate sequential consistency, but rather 
deal directly with the semantics of release consistency. 

Control dependencies should be dealt with in the programming paradigm, and 
not the consistency model itself. It is unnecessary for a consistency model to force 
the programmer to use a particular control flow paradigm like mutual exclusion. 
The consistency model should only describe data dependencies. For any sequence 
of submitted operations the model gives the set of possible outcomes. It is not the 
job of the consistency model to restrict the sequences of operations that are allowed 
to be submitted. Any control dependencies can be independently enforced in the 
program. If the programmer really wants mutual exclusion the consistency model 
does not prevent this. This does not necessarily even make the programmer's job 
any harder as control flow constructs can be implemented in libraries of locking and 
barrier primitives. 

Synchronization operations in location consistency are similar to entry consis- 
tency in that they are tagged with a variable, and only c;nforce dependencies with 
ordinary operations to that variable. The mutual exclusion assumption stated 
above requires that there is a total order on all synchronization operations to each 
variable so location consistency enforces at least cache consistency on synchroniza- 
tion operations. However, the description of location consistency [Gao and Sarkar 
2000] does not specifically say that synchronization operations must obey sequen- 
tial consistency. There is no example in the paper with synchronization operations 
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to more than one variable so it is difficult to say whether the authors intended syn- 
chronization operations to be sequentially consistent, or merely cache consistent. 
For similarity with previous models sequential consistency is assumed. 

We will now give the definition of the data dependencies implied by location 
consistency assuming synchronization operations are sequentially consistent. The 
definition will not include control dependencies implied by the mutual exclusion 
paradigm. The definition will be equivalent to location consistency for programs 
that conform to the mutual exclusion paradigm, and it will extend location consis- 
tency for programs that do not conform to the mutual exclusion paradigm. In the 
original definition of location consistency, due to the mutual exclusion requirement 
there is an alternating order on the synchronization operations to each variable: 
acquire, release, acquire, release, etc. Each acquire is immediately after a release 
which is called its mostjrecentjrelease, and immediately before a release by the 
same process. The state of a variable, x, is defined to be a partial order, -< which 
is the union of <po \{s, *, x, *) U (w, *, x, *) and the condition that all acquires to 
X are ordered after their mostjrecentjrelease. 

Because -< is a partial order there may be many writes that could be considered 
"most recent" in that there is no other write ordered after them. A read is allowed 
to return a value written by any one of these most recent writes. More formally, 
-< is augmented with any process order edges between the read and any operation 
in (s, *, X, *) U (w, *, X, *) to produce Then, the read, r, may return the value 
of any write, w, to the variable x such that ^w' where w w' <' x. To put this 
in a similar notation as the other synchronized models, the first requirement is the 
same that synchronization operations must be sequentially consistent. 

3 <seq=ScrialVicw(<po |(s, *,*,*)), and 

<s=the transitive closure of <d U <seq where <d is defined the same 
as entry consistency 

For programs that obey mutual exclusion there is already a total order, <aeq, 
on the synchronization operations to each variable. So <5 is merely the transi- 
tive closure of process order and mostjrecentjrelease order. Therefore, <s is an 
equivalent definition of -<. For programs that do not obey mutual exclusion This 
is a sensible extension of the definition of -< that maintains similarity with other 
synchronized models. Now, location consistency defines the set of values that may 
be returned by any read. To capture this, we will add to my formalism the notion 
of a partial-ordered view. 

Definition 3.11. There exists a serial partial view on a set of operations, subset, 
respecting a partial order, <, denoted SerialPartialView(< \suhset) iff 

V^^reo such that w <w' < r 

A serial partial view is a minimal order, that is it doesn't add any edges to <, 
it just checks if each read reads from a non dominated write. This is unlike a 
serial view that irmst add edges to create a total order out of any partial order it 
respects. The order, <, must still be a partial order. For example, there cannot 
exist a serial partial view respecting a cyclic relation. Now, location consistency was 
defined where each read had its own serial partial view. However, if a serial partial 
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Sequential 



Local 



Fig. 5. An initial consistency model lattice 

view exists separately for two reads over the same set of writes and synchronization 
operations, then those two reads can be added to the same partial order, and it 
will still be a serial partial view. There is no interaction between the two reads. 
Therefore, the condition that all reads read a permissible value can be stated thusly. 

Viep,a;ey3 SerialPartialVicw(<5 U <po |(*, i, x, *) U {w, *, x, *)) 

Therefore, the definition of location consistency is identical to to the definition 
of entry consistency with SerialView replaced by SerialPartialView. 

4. CONSISTENCY PROPERTIES 

Some existing consistency models have been viewed as a combination of other mod- 
els. For example, processor consistency [Goodman 1989] is the combination of 
PRAM and Cache consistency. Causal ordering [Ahamad et al. 1991] is the tran- 
sitive closure of process order and writes-to order. Lamport's original definition 
of sequential consistency [Lamport 1979] included a pair of properties which, if 
independently enforced, would enforce sequential consistency. This suggests that 
perhaps many existing consistency models could be viewed as difi^erent combina- 
tions of a few primitive consistency properties. In this section we define four such 
properties. Global Process Order (GPO) is the condition that there is global agree- 
ment on the order of operations from each process. Global Data Order (GDO) is 
the condition that there is global agreement on the order of operations to each vari- 
able. Global Write-read-write Order (GWO) is the condition that there is global 
agreement on the order of potentially causally related writes. Global Anti Order 
(GAO) is the condition that there is global agreement on the order of any two writes 
when a process can prove it has read one before the other. Any combination of 
these properties results in a consistency model. Enumerating these models results 
in the lattice shown in Figure 13. 

For pedagogical purposes, we will start with the lattice shown in Figure 5 and 
expand the lattice as properties are developed. The top of the lattice is defined 
to be sequential consistency, and the bottom is defined to be local consistency as 
these are the strongest and weakest properties in the literature (see Section 2.) 

4.1 Processor Consistency as a Combination of Properties 

Processor Consistency is defined to be a combination of PRAM and cache con- 
sistency (see Definition 3.4.) The given definition of processor consistency re- 
quires constructing per-variable views to satisfy cache consistency in addition to 
per-process views to satisfy PRAM consistency. To remove this inconvenience, we 
will define two properties, one equivalent to PRAM consistency, and one equiva- 
lent to cache consistency such that both properties can be combined in the same 
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per-process views. 

Definition 4.1. An execution is Global Process Order (GPO) iff 

V,;gp3 SerialView(<jLoca; U <po |(*, i, *, *) U [w, *, *, *)) 

Theorem 4.2. GPO is equivalent to PRAM consistency. 

Proof: The definitions of GPO and PRAM are identical. The views for 
GPO are required to respect local order for similarity with the properties 
to follow. However, this requirement is redundant because process order 

is a superset of local order for any process. 

Definition 4.3. Two operations are ordered by data order, Oi <do 02, iff they 
are to the same variable, and either 

(1) oi <po 02, or 

(2) oi 1-^ 02, or 

(3) There exists a read, r, to the same variable such that Oi <po r, Oi has a 
different value than r, and 02 1-^ r, or 

(4) There exists an operation, o, such that oi <do o <do 02- 

Data order captures the restrictions involved in constructing the required views 
for cache consistency. The operations o\ and 02 can be either reads or writes, but 
must be to the same variable. Data order contains writes-to order and process 
order restricted to pairs of operations to the same variable because the views for 

cache consistency must be serial and respect process order. For the third condition, 
a particular process reads or writes a value, oi, and then at a later time reads a 
different value from the same variable, r. That process can deduce that a write, 02, 
must have ocxuirrcd between those two operations and so the restriction is included 
in data order. The fourth condition requires that data order is a transitive closure. 

Definition 4.4. An execution is Global Data Order (GDO) iff 

Vj£p3 SerialView( <ii:,oca; U <do !(*,«,*,*) U (w, *,*,*)) 

The proof that GDO is equivalent to cache consistency uses several lemmas: 

Lemma 4.5. If an execution is Cache Consistent then Data Order is acyclic. 

Proof: Data order only contains edges between pairs of operations to 
the same variable. Therefore, if data order were cyclic, the cycle would 
have to involve only operations to a single variable. Cache consistency 
requires for every variable a serial view respecting process order on all 
the operations to that variable. We will show that these views must also 
respect data order, and so data order is acyclic. 

The cache consistent view for a variable respects process order by defi- 
nition and writes-to order because it is serial so it respects the first two 
conditions of data order. The third condition of data order must also be 
respected. If oi is process ordered before r it must come before r in the 
view. If, in addition, oi has a different value than r, and 02 writes to r, 
then oi must come before 02 in the view. If not and oi is a write then 

Journal of the ACM, Vol. V, No. N, Month 20YY. 



A Unified Theory of Shared Memory Consistency • 19 

r does not read from the most recent write so the view is not serial. If 
not and oi is a read then either 02 is the most recent write before oi , in 
which case oi does not read from the most recent write, or there is an- 
other write between 02 and oi. This write is also between 02 and r, and 
it has the same value as oi which is different than r, so r does not read 
from the most recent write and the view is not serial. Since the view 
is a total order and it respects the first three conditions of data order 
it must respect their transitive closure which is the fourth condition of 
data order. 

The views required for cache consistency must respect data order. If 
data order contained a cycle then the view for some variable could not be 

constructed and the execution would not be cache consistent. Therefore, 
if the execution is cache consistent data order is acyclic. 

Lemma 4.6. // two reads are ordered by data order then either they are by the 
same process, or they are ordered by a transitive chain containing a write. 

Proof: Two reads cannot be ordered by writes-to, or by having one 
write to a read that the other is process ordered before. So the only 
way two reads can be data ordered is by process order, or a transitive 
chain. If two reads are not by the same process, and are data ordered 
take the last operation in the transitive chain. If this operation is a 
write it satisfies the lemma. Otherwise, it must be a read by the same 
process as the final read. By the same logic the next to last operation 
in the chain must also be a write, or a read by the same process as the 
final read. By induction, if there is no write in the chain then the first 
operation in the chain must be a read by the same process as the final 
read. Therefore, if the two reads are not by the same process there must 
be a write in the transitive chain. 

Lemma 4.7. If an execution is GDO then data order is acyclic. 

Proof: GDO requires a view for every process that is serial and respects 
data order over the subset of all operations by that process plus all 
writes. If these views are constructible then data order must be acyclic 
at least on the subsets of operations in each view. Therefore, if data 
order is cyclic then the cycle must contain at least two read operations 
by different processes, ri and r2, such that ri <do i~2 and r2 <do fi. 
By Lemma 4.6 these two reads must be ordered by two transitive chains, 
and each chain must contain a write. Because data order is a transitive 
closure there must be a cycle between the writes in the two chains. This 
makes it impossible to construct the views required for GDO because 
every view must include all writes. If data order is cyclic then the views 
required for GDO cannot be constructed. Therefore, if an execution is 
GDO then data order is acyclic. 

Lemma 4.8. If data order is acyclic then 
\/xev^ <x=SerialView{<DO \{*,*,x,*)) 
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Proof: First, collect all the operations on a single variable and place 
them into groups where each group contains a write and all reads that 
the write writcs-to. Order the operations in each group in an order 
that respects data order. This is possible because data order is acyclic. 
The reads in a group all read from the write in that group so the write 
will be ordered first in each group. The serial view for that variable is 
constructed by ordering the groups with no interleaving of operations 
between different groups. For every read, the most recent write to the 
same variable must be the write from its group which is the write which 
wrote-to it so the view must be serial. Any order of the groups with no 
interleaving will produce a serial view. 

If Gi and G2 are two groups then define group order, <go ^s: Gi <go 
G2 iff 3oi e Gi,02 € G2 such that oi <do 02- If group order is 
acyclic then any topological sort on group order will produce a view 
that respects data order and is serial. Assume there is a cycle in group 
order, but not in data order. Take any two ordered groups from the cycle, 
Gi <GO G2- We will show that the writes from the groups, w\ and W2, 
must be ordered by data order. Therefore, any cycle in group order must 
be accompanied by a cycle in data order. So if data order is acyclic then 
group order must be acyclic and the views can be constructed. 

There must be operations from the two groups such that oi <do 02- 
Either oi is wi, or oi is a read that wi writes-to. So wi <do 02- Also, 
either 02 is W2 in which case wi <do W2, or 02 is a read that W2 writes- 
to. If 02 is a read consider how it came about that wi is data ordered 
before 02. Wi did not write to 02 so either wi <po 02, or they are 
ordered by a transitive chain. If w\ <po 02 then w\ <do W2 because 
W2 ^02- If not, let o be the last operation in the transitive chain so 
wi <DO o <DO 02- Either o 02 in which case o is -012 and wi <do W2 
ov o <po 02 in which case o <do W2 because W2 1— > 02 so wi <do W2- 

Therefore, if Gi <go G2 then wi <do ^2- Any cycle in group order 
will be reflected in data order by the writes. If data order is acyclic then 
there can be no cycle in group order, and a topological sort of the groups 
respecting group order will produce the required serial view respecting 
data order for that variable. 

Lemma 4.9. // the serial views, <x, defined in Lemma 4-8 exist then 

ViepUa;gv <a; U <iLocai is acyclic. 

Proof: Assume that a cycle exists for some process, i. The views, <x, 
are acyclic, and their union cannot contain a cycle because no operation 
is in more than one view. Therefore, the cycle must have an edge in 
the local order, and thus include operations by process i. Pick any 
operation by process i in the cycle. Call it o. Follow the edges that 
make up the cycle. If you follow an edge in local order then you must 
reach an operation by process i that occurs after o in local order. If you 
follow an edge not in local order then you must reach an operation to 
the same variable as o by a process other than i. Operations by other 
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processes are not ordered by local order, and thus the cycle must proceed 
through operations to the same variable following edges of <x for that 
variable until reaching an operation by process i. This operation must 
be to the same variable as o, and must be ordered after o by local order. 
Otherwise, the view for that variable would not respect data order. 

In any case, the first operation by process i encountered after o in the 
cycle must be after o in local order. Call this operation o' . By the 
same logic the next operation by process i after o' in the cycle must be 
ordered after o' and o by local order. By induction, every operation by 
process i in the cycle must be after o in local order. Eventually the cycle 
will reach o itself showing that there is a cycle in local order which is a 
contradiction. 

Lemma 4.10. If the views, <x, defined in Lemma 4-8 exist then the execution is 
GDO. 

Proof: Construct the view for process i required for GDO as any topo- 
logical sort of (*, i, *, *) U {w, *, *, *) respecting yj^^v <x U <iLocai This 
is possible because by Lemma 4.9 the relation is acyclic. The views will 
be serial because the views, <x, are serial, and the relative position of 
all pairs of operations to the same variable is preserved. Data order only 
contains edges between two operations to the same variable and so is a 
subset of Uxev <x- Therefore, the constructed views respect local order 
and data order, and they are serial so the execution is GDO. 

Lemma 4.11. If the views, <x, defined in Lemma 4-8 exist then the execution is 
cache consistent. 

Proof: Process order restricted to the set of operations on a single 
variable is a subset of data order. Therefore, any view on *) 
that respects data order will also respect process order. Therefore, the 
views defined in Lemma 4.8 respect process order, and so prove that the 
execution is cache consistent. 

Theorem 4.12. An execution is Cache Consistent iff it is CDO iff data order 
is acyclic. 

Proof: Follows directly from lemmas 4.5, 4.7, 4.8, 4.10, and 4.11. 

This is an important result because it provides two new ways to define cache 
consistency. One can determine whether an execution is cache consistent by the 
original method of constructing per- variable serial views, or now by constructing 
per-process serial views, or even by testing the cyclicity of the data order relation. 
Now that cache consistency is defined over per-process views we can combine GPO 
and GDO more easily. 

Definition 4.13. An execution is GPO+GDO iff 

Vj£p3 SerialView(<iLoca( U <po U <do |(*, «,*,*) U (w, *)) 

However, GPO-I-GDO is not quite equivalent to Goodman's definition of proces- 
sor consistency. Processor consistency requires that all processes agree on a total 
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{w,pi,x, 1) <po {w,pi,z,2) <po (r,pi,y,±) 
{w,p2,y,3) <po {w,p2,z,4) <po {r,p2,x,l.) 
(M),e,a;,±) i-^ (r,p2,a;,±) 
{w,e,y, ±) i-» (r-,pi,j/,±) 

Fig. 6. A GPO+GDO, but not processor consistent execution. 

order of operations to each variable. In Figure 6, the processes cannot agree on the 
order of the writes to z. If {w,pi, z, 2) was first, then p2 should have read 1 from x. 
Likewise, if {w,p2,z,A) was first, then pi should have read 3 from y. However, the 
two writes to z are not ordered by data order. Even tmder processor consistency 
they are allowed to occur in either order, but GPO+GDO does not enforce that 
they be seen in the same order by all processes. This can be solved by creating 
augmented data order, <do'- Augmented data order is any superset of data order 
that enforces a total order on all operations to each variable. By Theorem 4.12, 
any GDO execution respects at least one augmented data order. The problem is 
that there may be more than one, and a single augmented data order may not 
be consistent with process order at all sites. GPO+GDO' is defined similarly to 
GPO+GDO. 

Theorem 4.14. Goodman's definition of processor consistency (as given in Sub- 
section 2.2) is equivalent to GPO+GDO'. 

Proof: Augmented data order is equivalent to the per-variable cache 
consistency views required for processor consistency. The per-process 
views for GPO+GDO' respect process order and augmented data order. 
The per-process views for processor consistency respect process order, 
and the per-variable cache consistent views. Therefore, the two required 
sets of views are equivalent. 

Augmented data order solves the problem of equivalence to Goodman's definition 
of processor consistency. However, we feel that even without augmented data order 
GPO+GDO is in line with the intended purpose of process order. In Figure 6 the 
writes to z are tmordered. Inserting reads to z to detect the order of the writes would 
create data order dependencies and eliminate the need for augmented data order. Is 
it likely that the correctness of a program would depend on the fact that those two 
operations are seen in the same order by all processes when their order is unknown? 
Also, the execution in Figure 6 was taken from [Ahamad et al. 1992] as an example of 
an execution accepted by the DASH definition of processor consistency, and rejected 
by Goodman's definition. The space of consistency models surrounding processor 
consistency has not been completely searched. We believe that GPO+GDO will 
prove to be a useful consistency model, and a systematic examination of this search 
space will lead to greater understanding of the foundations of consistency models. 

Alternatively, GPO+GDO is equivalent to the following modified definition of 
processor consistency where the Vjgp is moved outside of the '^xev, and each process 
respects a set of cache consistent views, but all processes do not have to respect 
the same set of views. 

ViepVa;ey3 <a;=SerialView(<po \{*,*,x,*)), and 

3 SerialView((Ua,ey <x) [j <po U {w, *,*,*)) 
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Fig. 7. A consistency model lattice including processor consistency 

{w,pi,x,l) <po {w,pi,y,2) 
{r,p2,y,2) <po (w,p2,x,3) 
(r,p3,a;,3) <po {r,ps,x,l) 
{■w,pi,x,l) H- > (r, p3,x,l) 
{w,pi,y, 2) {r,p2,y, 2) 
(■ii!,p2,a;,3) 1-^ (r,p3,x,3) 

Fig. 8. A PRAM and cache, but not GPO+GDO consistent execution. 

This issue is discussed in more detail in Section 5. The same issue comes up 
when defining synchronized consistency models as consistency transitions. The 
synchronization operations must be sequentially consistent, but there may be more 
than one total order that would satisfy sequentially consistency. The ordinary 
operations are not required to be sequentially consistent, and may demonstrate that 
different processes saw different sequential orders even though the synchronization 
operations in isolation are sequentially consistent. 

GPO+GDO begins a framework for defining consistency properties (see Fig- 
ure 7.) Any property that can be defined as a relation which must be respected 
by per-process views can be combined with process order and data order to create 
new consistency models. 

There can also be executions that are GPO and GDO, but not GPO+GDO. 
Ahamad, et. al. [Ahamad et al. 1992] provide the execution in Figure 8 which is 
PRAM and cache consistent, but not processor consistent. The execution is GPO 
because of the following views. 

pi : {w,p2,x,3) <p, {w,pi,x,l) <p, {w,pi,y,2) 

P2 ■■ {w,pi,x,l) <p^ (w,pi,y,2) (r,p2,y,2) <p^ (w,p2,x,3) 

P3 ■■ {w,p2,x,3) {r,p3,x,3) {w,pi,x,l) <p^ {r,p3,x,l) <p^ 

{w,pi,y,2) 
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(w,pi,x, 1) <po (r,pi,y,3) <po {r,pi,x, 1) 
(r,p2,x,'i-) <PO {w,p2,x,2) <po {w,p2,y,3) 
{w,pi,x,l) I— > (r, pi , 1) 
{w,pi,x,l) I— > (r,p2,x,l) 
{w,P2,y,2) ^ (r,pi,y,3) 

Fig. 9. An Execution That Violates Causal Consistency 

Data order is as follows. 

{w,P2,X,3) <DO {r,P3,x,S) <DO {w,pi,x,l) <DO {r,P3,x,l) 

{w,pi,y,2) <Do {r,p2,y,2) 

The execution is GDO because of the following views. 

Pi : iw,p2,x,3) <p, {w,pi,x, 1) (w,pi,y,2) 

P2 ■■ {w,pi,y,2) {r,p2,y,2) <p^ {w,p2,x,3) <p^ {w,pi,x,l) 

P3 ■■ {w,P2,x,3) {r,p3,x,3) {w,pi,x,l) <p^ {r,p3,x,l) <p^ 

{w,Pi,y,2) 

However, in <p^ the position of {w,pi,x, 1) is different between the GPO and 
GDO views. There is no view Kp^ that conform to both <po and <do- 

{w,pi,x,l) <po {w,pi,y,2) <no {r,P2,y,2) <po {w,p2,x,3) <do 
{w,pi,x,l) 

so <DO U <po has a cycle. must contain all of these operations and thus 
cannot be constructed. This leads to the definition of another consistency model. 

Definition 4.15. An execution is GPOnGDO iff 

Vj£p3 SerialView(< 

iLocal U <po \{*, i, *, *) U (w, *, *, *)) /\ 
VjgpB SerialView(<ii,oca« U <do \{*, i, *, *) U {w, *, *, *)) 

Any pair of properties can be combined in this way creating a new consistency 
model. The meaning of these models has not been explored previously in the 
literature, and we have not explored them in our work. They are mentioned here 
for completeness. 

4.2 Causal Consistency as a Combination of Properties 

Causal consistency is stronger than GPO, but incomparable to GPO+GDO. There- 
fore, there should be a property that enforces that part of causal not already covered 
by GPO. Causal consistency depends on the causal relation which is the transitive 
closure of process order and writes-to order. The causal relation is made up of three 
types of edges: edges in process order, edges in writes-to order, and edges not in 
either order, but in the transitive closure. Process order has already been identified 
as a primitive property, and any serial view respects writes-to order. Therefore, 
we now define another property which contains the edges in the transitive closure. 
This new property can be used with process order to define causal consistency. 

For example. Figure 9 contains an execution that is not causally consistent even 
though the following serial views respect both process order and writes-to order. 
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Fig. 10. Enumerated Possibilities for a Causal Transitive Chain 

pi : (w,p2,a;,2) <p^ {w,p2,y,3) <p, {w,pi,x,l) <p, (r,pi,y,3) <p, 

{r,pi,x, 1) 

P2 ■■ {w,pi,x, 1) (r,p2,a;, 1) <p^ {w,p2,x,2) <p^ {w,p2,y,3) 

There is a causal dependency from {w,pi,x, 1) to {w,p2, x, 2) because 

{w,pi,x, 1) 1-^ {r,p2,x, 1) <po (w,P2,x,2). 

However, <p^ places them in the opposite order because <p^ does not contain 
the operation {r,p2,x,l) which is a read operation by p2- Therefore, it violates 
neither process order nor writes-to order among the operations in its view. To be 
causally consistent the view for each process must respect: 

(the transitive closure of <po U h *; *) U {w, *, *) 

The definition for GPO already respects: 

the transitive closure of (<po U i-^- |(*, i, *, *) U {w, *, *, *)) 

Note the different parentheses. The new property can be found in the set dif- 
ference of these two relations. For an edge to be in the first relation and not the 
second, two operations in (*, i, *, *) U {w, *, *, *) must be transitively ordered by a 
chain of operation not in (*, i, *, *) U {w, *, *, The only operations not in that 
set are reads by a process other than i. Reads cannot be ordered with each other 
by writes-to order, and if a chain of reads is ordered by process order they must all 
be by the same process, and the first and last reads in the chain will be ordered. 
So, any transitive chains of the sort we are interested in must have an operation, oi 
ordered by process order or writes-to order before a read, ri, possibly ordered by 
process order before another read, r2, ordered by process order or writes-to order 
before an operation, 02. All possibilities are summarized in Figure 10: 

Cases 1, 2, 3, and 4 are impossible because a read cannot be on the left hand side 
of a writes-to relation. In cases 5 and 6, the two operations, oi and 02, are already 
ordered by process order. In case 7, ri and 02 are ordered by process order so it 
reduces to case 8. Therefore, the only case that must be considered is case 8. 

In case 8, oi must be a write because it writes to ri. 02 is in the set (*, i, *, *) U 
(w, *,*,*). ri is not in this set and so is not by process i. 02 is by the same 
process as ri so it must be a write by another process. Therefore, only causal 
chains between two writes must be considered. 

Definition 4.16. Two writes are ordered by write-read-write order, wi <wo W2, 
iff there exists a read, r such that wi ^ r <po W2- 
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Fig. 11. A consistency model lattice including causal consistency 

Definition 4.17. An execution is Global Write-read-write Order (GWO) iff 

Vigp3 SerialView(<jLoca; U <wo *, *) U (w, *, *)) 

Theorem 4.18. GPO+GWO is equivalent to causal consistency. 

Proof: By the logic above, the transitive closure of <po U <wo U ^ 
|(*, i, *, *) U {w, *, *, *) is equivalent to <cr \{*, i, *, *) U {w, *, *, *). Any 
serial view respects i-^, and a view is a total order so if it respects a 
relation it respects the transitive closure of that relation. Also, any view 
that respects <po respects <iLocai- So a serial view respects <iLocai 
U <po U <wo |(*, h *: *) U i'W, *, *, *) iff it respects <cr |(*, i, *, *) U 
{w, *, *, *). The first is the requirement for GPO+GWO. The second is 
the requirement for causal consistency. 

Adding GWO to the evolving lattice of consistency models results in Figure 11. 
The model GPO+GDO+GWO has been previously discovered. In [Ahamad et al. 
1992] the authors noticed that the definition of processor consistency allows exe- 
cutions that violate causality, and they developed an extension to processor con- 
sistency to prevent this. At this point the lattice contains two new consistency 
models: GWO, and GDO+GWO. 

4.3 Sequential Consistency as a Connblnation of Properties 

GPO+GDO+GWO is weaker than sequential consistency. Consider the execution 
in Figure 12. The two writes are not by the same processor, nor to the same 
variable, and they are not causally related. These two writes could be seen to occur 
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(w,pi,x,l) <po (r,pi,y,±) <po (r,pi,y,2) 
(■w,p2,y,2) <po {r,p2,x,±) <po {r,p2,x,l) 
(ui, e, J/, ±) I— > (r, pi,y, X) 
(w,p2,y, 2) 1-^ (r,pi,y, 2) 
(w,e,x,±) ^ {r,p2,x,±} 
{w,pi,x, 1) i-> (r,p2,x, 1) 

Fig. 12. An Execution that Violates Sequential Consistency. 

in cither order, but to be sequentially consistent every process must see them in 
the same order. In this execution the following cycle exists: 

{w,pi,x,l) <po (r,pi,y, _L) <do (w,p2,y,2) <po {r,p2,x,±) <do 
{w,pi,x,l) 

But GPO+GDO+GWO requires separate views for processes pi andp2, and each 
view includes only its own read operations. So the following views are acceptable: 

pi : {w,pi,x,l) {r,pi,y,±) <p, {w,p2,y,2) <p, {r,pi,y,2) 
P2 ■■ {w,p2,y,2) {r,p2,x,±) <p^ {w,pi,x,l) <p^ {r,p2,x,l) 

For this execution to be prohibited there must be another order that takes a cycle 
which includes read operations and creates a cycle among only write operations. 

In Figure 10 there were eight cases of a causal transitive chain. Four of them were 
deemed impossible because a read could not be on the left hand side of a writes-to 
order. These cases are made possible by using data order as a generalization of 
writes-to order. A read may not be able to write to another operation, but it may 
be able to prove that it happened first. These four cases are the basis of a new 
consistency property called anti order. The name anti order comes from parallel 
compiler optimization. When a program contains a read and later a write to the 
same variable their orders cannot be reversed. This is called an anti dependency, 
and is similar to this situation where a read can prove, through data order, that a 
write happened after it. It is at least similar enough to borrow the name. 

The purpose of Global Anti Order (GAO) is to complete the set of consistency 
properties so that, together, they simulate sequential consistency. To do this, anti 
order must take cycles involving read operations, and short circuit them to pro- 
duce cycles involving only write operations. Therefore, anti order is limited to 
the case where oi and 02 (in Figure 10) are writes. This weakens anti order, 
and our desire is to produce the weakest relation that supports the assertion that 
GPO+GDO-I-GWO-I-GAO is equivalent to sequential consistency. From Figure 10, 
case 1 seems necessary because the writes may only be ordered through the reads. 
Case 2 seems unnecessary because the writes are already ordered by data order, 
but it will be needed as explained later. Case 3 reduces to case 4, and case 4 solves 
the problem of Figure 12 since 

{w,px,x,l) <po (r,pi,y,_L) <do {w,P2,y-2), and 
{w,P2,y,2) <Po {r,P2,x,±) <DO {w,Pi,x,l), so 

{w,pi,X, 1) <AO {w,P2,y, 2) <AO {w,Pi,X, 1) 

So, an initial idea is to base anti order on only cases 1 and 4. However, this 
solution is not complete. The execution in Figure 12 can be modified by removing 

Journal of the ACM, Vol. V, No. N, Month 20YY. 



28 • R. Steinke and G. Nutt 

the final read of each process. This means that condition 3 of data order no longer 
applies and the writes are not data ordered after {r,pi,y, _L) and {r,p2, x, ±). There 
is no anti order cycle, and the execution is no longer rejected by anti order even 
though it still violates sequential consistency. The problem is with a limitation of 
data order. If a read does not read from a write to the same variable this is not 
enough to deduce that the write happened after the read. It could have happened 
very early and been overwritten. However, it could not have happened between the 
read and the write that wrote-to the read. This ordering restriction is not present 
in data order. Capturing this restriction requires a non-deterministic order called 
serial order. One can think of serial order as "pseudo data order" that can replace 
writes-to order in the cases given in Figure 10. We now need to include case 2 
because wi i— > n <so W2 does not guarantee that the writes are data ordered. 

Definition 4.19. A Serial Order, <so, is a minimal set of edges that enforces 

the following condition: 

'^w,reO such that w and r are to the same variable and do not have the 
same value either w <so w' r or r <so w 

So the final definition of anti order is as follows. 

Definition 4.20. Anti-Order, <ao(<so)' 

Given a serial order, <so, 
Vt«i,^2eO wi <Ao W2 iff 
3ri , r2 such that 

wi i-> n <po r-2 <DO W2, or 

wi i-> ri <po r2 <so W2, or 

Wi !-» n <so W2, or 

wi <Po ri <DO W2, or 

wi <Po ri <so W2 

To define global anti order there must be serial views that respect anti order for 
some definition of serial order. However, this is still not enough. In the example of 
Figure 12 with the final reads removed serial order could be defined as: 

{w,pi,x,l) <so {w,e,x,l.) 
{w,p2,y,2) <so (w,e,?/,±) 

There would be no anti order links. The views could then be written: 

pi : {w,pi,x, 1) {r,pi,y,±) <p^ {w,p2,y,2) 
P2 ■■ {w,p2,y,2) {r,p2,x,±) <p^ {w,pi,x,l) 

These views respect process order, data order, write-read-write order, and even 
anti order for some definition of serial order. They also respect some definition of 
serial order, but not the same definition that was used to construct anti order. This 
is the crucial last piece of the puzzle. The views must respect the same definition 
of serial order that was used to construct anti order. 

Definition 4.21. An execution is Global Anti Order (GAO) iff 3 <so such that 
Viep3 SerialView(<ii,oca! U <so U <ao{<so) !(*> *' *) ^ *' *> *)) 
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Serial order is a non-deterministic order in the sense that it may have many 
possible definitions, and if any one of the definitions accepts the execution then the 
execution is accepted. The number of possible serial orders for any execution is 
not infinite. In fact, for each pair of a read and a write with the same variable and 
a different value there is exactly one edge in serial order, and this edge is chosen 
from two choices. Therefore, the number of serial orders for an execution is exactly 
2^ where x is the number of such read-write pairs. When accepting executions, an 
implementation of anti-order could be conservative, and only consider a subset of 
possible serial orders. It could even deterministically chose a single serial order on 
which to accept executions. This way, the implementation could be more efficient 
without accepting any unacceptable executions. However, it might reject some 
acceptable executions. From now on, for purposes of brevity we will use serial 
order as if it were a single order. Any definition using serial order can be read 
"There exists a serial order such that. . . " 

It would be desirable if all four properties were orthogonal, but this is not the 
case. GAO is strictly stronger than GDO which is proven below. One goal of 
this work was to develop GAO to be as weak as possible while still supporting the 
assertion that GPO+GDO+GWO+GAO is equivalent to sequential consistency. 
Every candidate definition of GAO that was not stronger that GDO did not support 
equivalence to sequential consistency. This may reveal some fundamental aspect 
of consistency models, or it may merely require further research to develop such a 
definition. As a result, GDO+GAO is equivalent to just GAO. 

Lemma 4.22. // data order has a cycle, then the execution is not GAO. 

Proof: 

Case 1: The cycle has a read. Take the operation immediately before 
the read in the cycle. If it is linked by a transitive chain add that 
transitive chain to the cycle. Repeat until the operation immediately 
before the read is linked directly without a transitive chain. This is either 
a write, or by lemma 4.6 it is a read ordered by process order. If it is a 
read, repeat until a write is reached. A write must be reached because 
otherwise the cycle will return to the original read which must be ordered 
before itself by process order which is a contradiction. The write that 
is reached is directly ordered by data order before the next operation in 
the cycle which is a read. They cannot be ordered by condition 3 of data 
order because this would imply that there exists a third operation such 
that the write is process ordered before that operation, and the read 
writes to that operation. This is impossible since a read cannot write 
to another operation. So the write must be ordered before the read by 
process order or writes-to order. Also, the operations are in a cycle in 
data order so the read is data ordered before the write. In either case, 
the write is anti ordered before itself. The serial views for GAO must 
all contain this write, so they cannot respect this cycle in anti order. 
Therefore, the execution is not GAO. 

Case 2: The cycle has no reads. Once again, expand the cycle so that 
no link is a transitive chain. If the transitive chain includes a read refer 
to case 1. None of the links can result from writes-to order because a 
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write cannot write to another write. The cycle must contain writes from 
at least two processes. If not, a write must be ordered by data order 
before another write earlier in process order. This must have come aboiit 
by condition 3 of data order. Therefore, the following condition exists, 
wi <po W2 < POr, and wi i-^ r. all of these operations are to the same 
variable. It is impossible for this process' view to be serial and respect 
local order, so the execution is not GAO. So there must be some links 
that result from condition 3 of data order between writes by different 
processes. Pick one write, wi and follow the cycle along process order 
links until a link resulting form condition 3 is reached. In this case, 
a write, W2 is process ordered before a read, r, which is written-to by 
another write, W3, creating the link W2 <do w'a- uji must also be process 
ordered before r because either it is process ordered before W2, or it is 
W2- So w\ <DO ws. Now, wi does not write to r, so it must be ordered 
by serial order either wi <so ^3 1-^ r, or r <so wi. The second 
case is impossible. The view for the process that submitted wi and r 
must contain both operations and respect local order. The assignment 
of r <so would prevent this, and so this assignment could never 
be used to show that the execution is GAO. Therefore, the assignment 
must be wi <so ws. By the same logic, follow the chain from ws to the 
next link that results from condition 3. There must be another write 
serial ordered after W3. Every time the cycle switches to an operation 
by a different process, the first operation by the new process must be 
serial ordered after W3. Continue around the cycle. At some point the 
cycle will change processes for the last time before reaching wi . The first 
operation by this new process is either wi, or a write process ordered 
before wi. This write must also be serial ordered before w^. Either it 
is wi, or it is process ordered before r, and the same reasoning applies. 
This assignment of serial order has a cycle involving only writes, and so 
no process' view could respect it. We have previously shown that any 
alternate assignment would also prevent the execution from satisfying 
GAO. Therefore, the execution is not GAO. 

Theorem 4.23. GAO is strictly stronger than GDO. 

Proof: GAO is shown to be stronger by Theorem 4.12 and Lemma 4.22. 
GAO is shown to be strictly stronger by the fact that the execution in 
Figure 12 satisfies GDO and not GAO. 

All that remains is to show that the four properties together make up sequential 

consistency. Since GAO is stronger than GDO we will leave it out and prove that 
GPO+GWO+GAO is equivalent to sequential consistency. 

Lemma 4.24. Every sequentially consistent execution is GPO+GWO+GAO. 

Proof: A sequentially consistent execution has a single, serial view on 
all operations that respects <po- Call this view <seq- By definition, 
<seq respects <po- If wi <wo W2 then 3r such that wi r <po W2- 
<seq respects <po and is serial so it respects 1— > and therefore it respects 
<wo- 
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Now we will show that a sequentially consistent execution respects 
<DO- This is not strictly required by the theorem, but will make it 
easier to prove that the execution satisfies <ao- <seq respects <po 
and is serial, and so respects the <po and conditions of <do- If 
oi < po r, and 02 1-^ r, and oi has a different value than r then oi must 
come before 02 in <seq, or the view will not be serial. If this were not 
so then oi must come between 02 and r because oi <po r and <aeq 
respects <po- There are two cases, oi is either a write or a read. If oi 
is a write then r docs not read from the most recent write and <seq is 
not serial. If Oi is a read then either oi does not read from the most 
recent write, or there is a write to the same variable with the same value 
as oi between 02 and oi in which case r does not read from the most 
recent write and <seq is not serial. Therefore, <seq respects condition 3 
of <DO- <seq is a total order. Since it respects the first three conditions 
of <DO it will respect the transitive closure condition. 

To prove that a sequentially consistent execution is GAO, define a 
serial order, <so, with edges in the same order as <seq- This is possible 
because if 3w, w' , r such that w' ^ r and w ^ w' then it cannot be that 
w' < 

seq ^seq ^ becausc then ^seq would not be serial, w must be 
ordered either before w' or after r. If 3wi,'W2 such that wi <ao W2 

then 3ri , r2 such that wi ^ ri <po ^2 < do W2 , or wi 1-^ ri < po 
r2 <so W2, or wi 1-^ Ti <so W2, or wi <po n <DO W2, or Wi <po 
r\ <so W2- <seq respccts I— >, <po, <DO, and <so so therefore respects 

<AO{<so)- 

So <seq respects <po, <wo, <SO, <ao(<so)' serial, and contains 
all operations so it can be used to construct the required per-process 
views for all processes: 

Vigp3SerialView(<iLoca; U <po U <wo U <so U <ao{<so) 
*, *) U (w, *, *, *)) 
so the execution is GPO+GWO+GAO. 

Lemma 4.25. For any GPO+GWO+GAO execution the per-process views can 
be constructed where all write operations occur in the same order in all views. 

Proof: Because <iLocai is a subset of <po we will ignore it and just 
show that the constructed views respect <po, <wOi <sO: <AO{<so)^ 
and are serial. There must be an initial definition of serial order for which 
the execution satisfies GPO+GWO+GAO. This definition of serial order 
is not changed throughout this proof. That is, the final constructed views 
satisfy GPO+GWO+GAO for the same definition of serial order as the 
initial views. All initial writes must be ordered first in all views because 
all initial writes are ordered before any other operation by <po • These 
initial writes can come in any order because they are not ordered with 
respect to each other, and there are no reads between them, so place 
them in the same order in all views. For any two views <i and <j, the 
first write that is not an initial write in <i can be placed first in <j. 
Then the next write in <i can be placed second in <j, and so on. We 
will use an inductive proof to show that this reordering can be done and 
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the resulting views will still respect <pOi <wo^ <SO) <AO(<so)' ^'^'^ 
serial. The inductive proof uses the following definitions and invariants: 

(1) The order < is defined as <po U <wo U <so U <ao(<so)- 

(2) The views <j and <j respect < and are serial. 

(3) The write operation being moved is called wi . 

(4) Point A is the place in <j where wi will be moved to. 

(5) Point B is the place in <j where Wi is being moved from. 

(6) Point B is after point A in <y 

(7) All write operations ordered before w\ in are before point A in 

<r 

(8) Corollary: All write operations ordered before wi by < are before 
point A in <j because <i respects <. 

The execution is GPO+GWO+GAO so there must exist initial views 
<i and <j that respect < and are serial. In the initial case, point A 
is just after the initial writes of <j. w\ is the first non- initial write in 
<i so only the initial writes arc ordered before it in <i and they are all 
before point A in <j. W\ is after the initial writes in <j so point B is 
after point A in <j. 

Consider all the operations between A and B. These must all be cither 
read operations by process j, or write operations not ordered before wi 
by <. Construct the set of prior reads as follows. The variable that 
Wi operates on will be referred to as x. Any read between A and B to 
variable a; is a prior read. Also, any read between A and B ordered by 
process order before w\ or a prior read is a prior read. Then construct 
the set of remaining operations as all reads between A and B that are 
not prior reads plus all writes between A and B. Now, we will show that 
Wi or any prior read can not be ordered after any remaining operation. 

Case 1: Wi was submitted by process j. Every read between A and B 
is a prior read. The remaining operations are all writes and cannot be 
ordered by < before Wi by the invariant. The remaining operations 
also cannot be ordered before any prior read by <. They cannot be 
ordered by < po because the write would be by process j and so would 
be ordered before wi which is a contradiction. A read and a write cannot 
be ordered by <wo or <ao(<so) because those two orders only occur 
between pairs of write operations. Also, a read cannot be ordered after 
a write by <so- 

Case 2: wi was not submitted by process j. If a remaining operation is 
a read it is by process j so it cannot be ordered before wi by < po • The 
remaining read also cannot be ordered before Wi by <wo or <ao(<so) 
because those orders only occur between pairs of write operations. The 
remaining read cannot be ordered before wi by <so because the read 
would be to the same variable as wi, and so would be a prior read. 
The remaining read cannot be ordered before any prior read because all 
reads are by process j so it would be ordered before a prior read by < po 
making it a prior read. 

If the remaining operation is a write it cannot be ordered by < before 
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wi by the invariant. It cannot be ordered before a prior read by <wo or 
<AO{<so) because those only order pairs of writes. It cannot be ordered 
before a prior read by <so because a read cannot be ordered after a 
write by <so- AH that remains is to show that a remaining operation 
which is a write cannot be ordered before a prior read by <po- Any 
prior read, r, comes before wi in <j. The write, W2, which wrote to r 
must also come before wi because <j is serial. If r is to the same variable 
as wi then either, w\ <so W2, or r <so w\. Since <j respects <so it 
must be the case that r <go ''^i- If ^ remaining operation is ordered 
before a prior read, ri , by < po then either ri is to the same variable as 
wi in which case ri <so wi, or ri is ordered by <po before r2 which 
is to the same variable as wi in which case r2 <so wi. Therefore, 
W3 <po {fi <po)i'2 <so wi so W3 <AO wi which is a contradiction of 
the invariant. 

In either case, wi and all prior reads are not ordered after any remaining 
operations by <. Now <j is changed as follows: All prior reads are placed 
immediately before point A preserving their order followed by wi. All 
other operations preserve their order. For all pairs of operations that 
change their relative position one must be wi or a prior read. The other 
must be a remaining operation. These pairs cannot be ordered by < so 
the view still respects <. 

Before the move, <j was serial so each prior read must have read from 
the most recent write to that variable. That write must have been before 
point A because it is anti ordered before wi . The write must still be the 
most recent write to the same variable because the moved read is after 
all writes before point A, and every write between the two was there 
before the move when <j was serial. Remaining reads maintained their 
relative position with all writes except wi. Remaining reads cannot be 
to variable x, and so they too must still read from the most recent write. 
No other pairs of reads and writes changed relative position so <j must 
still be serial. 

Now, move point A to immediately after wi. The next write in <j 
becomes the new wi. This write has not been moved to before point A 
in <j so point B is still after point A. The set of writes before wi in 
<i have all been moved to before point A in <j, so the invariants are 
satisfied. Therefore, by induction one can create views for all processes 
that respect < and have the write operations in the same order in all 
views. 

Lemma 4.26. For any GPO+GWO+GAO execution it is possible to construct 
a single view containing all operations that respects process order and is serial. 

Proof: From lemma 4.25 create views which all have the write opera- 
tions in the same order. These orders respect <po and are serial. Then 
take one of these views and add the read operations of all other processes 
in the same relative position to the writes as they occur in their own 
view. The read operations must all be ordered by <po correctly with 
respect to all writes because the writes occur in the same order in every 
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view. Reads ordered with respect to each other by < po come from the 
same view, and so they are placed in that order in the new view. The 
serial property is not affected by the relative position of pairs of reads, 
and every read operation is in the same position relative to all writes, 
so the view must be serial. 

Theorem 4.27. GPO+GWO+GAO is equivalent to sequential consistency. 

Proof: Follows directly from lemmas 4.24 and 4.26. 

Adding GAO almost completes the lattice as shown in Figure 13. Since GAO 

is stronger than GDO any box labeled with GAO will also enforce GDO, but that 
is not shown for brevity. The lattice now has three additional new consistency 
models: GAO, GPO+GAO, and GWO+GAO. The lattice is almost complete, but 
it docs not yet contain slow consistency. Slow consistency would be located below 
both PRAM and cache, and above local. 

4.4 Slow Consistency as a Combination of Properties 

In slow consistency [Hutto and Ahamad 1990], two operations must maintain their 
order only if they are by the same process and to the same variable. This leads to 
the following definitions. 

Definition 4.28. Two operations are ordered by process-data order, oi <pdo 02, 

iff oi <po 02, and oi <do 02- 

Definition 4.29. An execution is Global Process-Data Order (GPDO) iff 

ViepEl SerialView(<iLoca; U <pdo |(*, i, *, *) U {w, *, *)) 
Theorem 4.30. GPDO is equivalent to slow consistency. 

Proof: For any GPDO execution, take the view for a single processor. 
Divide this view into separate views, one for each variable by restricting 
the set of operations to operations on a single variable, but maintaining 
their relative order. Process-data order contains all edges in process 
order between operations to the same variable. These views respect 
process-data order, and contain only operations to a single variable so 
they respect process order. These views are exactly what is required to 
satisfy slow consistency. 

For any slow consistent execution, gather together the views over all 
variables for a particular processor. By similar logic to Lemma 4.9, 
the union of these views and <iLocai must be acyclic. The union of 
the views must contain every edge in process-data order. Therefore, 
any topological sort of the union of the views and <iLocai must respect 
<iLocai U <pDO- Also, each view is serial. In the topological sort, 
every pair of operations to the same variable must preserve their relative 
position so the topological sort must be serial. The topological sort is 
exactly what is required to satisfy GPDO. 

GPDO is more than just a new statement of slow consistency. It represents a new 
way of combining consistency properties. We have already seen GPO-I-GDO as a 



Journal of the ACM, Vol. V, No. N, Month 20YY. 



A Unified Theory of Shared Memory Consistency • 35 



GPO+GWO+GAO 
Sequential 





Fig. 13. The Complete Lattice of Consistency Models 



way to combine two models to produce a stronger model. Now, GPDO combines two 
models to produce a weaker model. This could be done for any pair of properties. 
For example, process-anti order orders only operations that are ordered by both 
process order and anti order. GPAO would be weaker than both GPO and GAO. 
However, it is questionable how useful models this weak would be. Slow consistency 
is essentially only valuable in defining synchronized models. Perhaps these models 
would be usable with a transition theory, and higher consistency operations between 
them for synchronization. 
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P5 {w ,Ps,a, 1) ir,P5,a, 2) 

P& {w,P6,a, 2) {r,P6,a, 1) 

Fig. 14. An Execution That Violates GDO 

(r, P7,fe, 2) (ui, P7, c, 1) 
P8 ir,ps,c, 1) (w,ps,b,2) 

Fig. 15. An Execution That Violates GWO 
4.5 A Lattice of Consistency Models 

The result of these composable consistency properties is the lattice of consistency 
models shown in Figure 13. Every possible combination of properties produces 
a model represented by a box in the lattice. The top of the lattice is sequential 
consistency, and the bottom is local consistency. Every pair of models has a unique 
least upper bound and greatest lower bound. There arc other combinations of 
properties demonstrated in this work such as GPOnGDO, and GPAO. These are 
not shown in the lattice for brevity, and because their utility is unknown. GPDO 
is shown in the lattice because slow consistency is a well known and widely used 
model. 

One can think of every box in the lattice as representing a set of executions that 
satisfies that model, and no stronger model in the lattice. To show that every 
box of the lattice is non-empty we provide example executions that violate each of 
the four consistency properties. To derive an example execution for a particular 
box, combine the executions violating all the properties not contained in that box. 
Figure 12 given when defining anti-order in Subsection 4.3 provides an execution 
that violates GAO without violating any of the other three properties. 

Figure 14 provides an execution that violates GDO (and thus GAO), but does 
not violate GPO or GWO. From condition 3 of data order: 

{w,p5,a,l) <DO {w,p6,a,2) <do {w,p5,a,l) 

Therefore, there is a cycle in <do so the execution is not GDO. However, write- 
read- write order is empty. The following views satisfy <po and <wo, and are 
serial. 

P5 ■■ {■w,P5,a,l) <5 {w,pe,a,2) <s {r.ps.a.,2) 
pe : {w,pe,a,2) <6 {w,p5,a, 1) <6 {r,P6,a, 1) 

Figure 15 provides an execution that violates GWO, but does not violate GPO, 
GDO, or GAO. The following cycle exists. 

{W,P7,C,1) <wo {w,P8,b,2) <wo {W,P7,C,1) 

These two writes must be present in all views, so no view can respect <wo- Each 
write is data ordered before the read it writes- to. Serial order and anti order are 
empty. The following views satisfy <po, <do, <ao(<so)' <so> ^nd are serial. 
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P9 {w,pg,d, 1) 

PlO (r-,pio,e,2) 



(w,pg,e, 2) 



{w,pio,d,3) 



{r,Pio,d, 1) 



Fig. 16. An Execution That Violates GPO 

P7 ■ iw,P8,b,2) <j (r,p7,6,2) <j (w,p7,c, 1) 
Ps ■ (w,P7,c,l) <8 (r,ps,c,l) <8 (ui,p8,&, 2) 

To produce an execution that satisfies only GPO and no stronger model in the 
lattice, define an execution containing p5 and from Figure 14 and pr and ps from 
Figure 15. Likewise, to create an execution satisfying only GPO+GDO combine 
Figure 12 with Figure 15, and so forth. 

Figure 16 provides an execution that violates GPO, but does not violate GDO, 
GWO, or GAO. In order for the view for pio to be serial, {w,pg,e,2) must come 
before {r,pio, e, 2), and {w,pg,d, 1) must come after {w,pio, d, 3). In order to respect 
local order, (r, pio, e, 2) must come before {w, pio, d, 3). Therefore, {w, pg, e, 2) must 
come before {w,pQ,d,l) which docs not respect <po- 

The following are the definitions of <do ^ind <wo for this execution. 

{w,pio,d,3) <DO {w,p9,d,l) <DO {r,piQ,d,l) 
(w,P9,e,2) <Do (r-,PiO,e,2) 
(w,P9,e,2) <wo {w,pio,d,3) 

With the following definition of serial order, anti order is empty. 

{w,pio,d,3) <so {w,p9,d,l) 

The following view for pio satisfies <do, <wo, <AOi<so)' <so, <pioLocai, and 
is serial. 

Pio ■■ {w,P9,e,2) <io (r,pio,e, 2) <io {w,pio,d,3) <io {w,pg,d,l) <io 
(r,pio,rf, 1) 

However, the view for pg is not as simple. The following cycle exists. 

{w,pio,d,3) <so {■w,P9,d,l) <pgLocai (w^,P9,e,2) <wo {w,pio,d,3) 

No view can be written for pg that satisfies GWO+GAO. However, separate 
views can be written, one that satisfies GWO, and one that satisfies GAO. 

pg{GWO) : {w,pg,d,l) <9 {w,pg,e,2) <9 {w,pio,d,3) 
pg{GAO) : {w,pio,d,3) <g {w,pg,d,l) <g {w,pg,e,2) 

Therefore, this execution satisfies GAO, and no stronger model in the lattice. 
It also satisfies GWO, and no stronger model in the lattice. By combining this 
execution with Figure 12 we achieve an execution that satisfies only GDO. All that 
remains is to find executions that satisfy GWO+GAO and GDO+GWO. 

Figure 17 satisfies GWO+GAO, but not GPO+GWO+GAO. Below is the defi- 
nition of < DO ■ 

{w,Pl2,f,2) <DO {w,pii,f,l) <DO {r,Pi2,fA) 
{w,pn,g,4) <no {w,pi2,g,3) <do {r,pii,g,3) 

Journal of the ACM, Vol. V, No. N, Month 20YY. 



38 



R. Steinke and G. Nutt 



Pll 



{w,pii,f, 1) 



{w, Pll, 9,4:) 



(r, Pll, 3,3) 



Pl2 



{w,pi2,g,s) 



{w,Pi2,f, 2) 



{r,Pi2,f, 1) 



Fig. 17. An Execution That Satisfies GWO+GAO 



The following definition of serial order must be chosen. 
{w, Pll, 9,4:) <so {■w,pi2,g,3) 

If not then {w,pii, g,A) must be ordered after {r,pii, g,3) which violates the 
order Kp^^Locai- Likewise for {■w,pi2,f,2) and (r,pi2,/, 1). <wo and <ao{<so) 
are empty. The following cycle exists. 

(w, Pll, 9,4:) <so (w,pi2,ff,3) <po {w,pi2,f,2) <so <po 
{w, Pll, 9,4) 

Therefore, it is impossible for any view to respect both <po and <so- So the 
execution is not GPO+GAO, and hence it is not GPO+GWO+GAO. However, 
this execution is GWO+GAO as the following views demonstrate. 

Pll ■■ {w,pi2,f,2) <ii {w,pii,f,l) <ii {w,pii,g,4) <ii 
{w,pi2,9,i) <ii {r, Pll, 9,3) 

Pl2 ■ {W, Pll, 9,4) <12 {W,pi2,9,3) <12 {w,pi2,f,2) <i2 
{W,pii,f,l) <12 {r,Pi2,f,l) 

To create an execution that satisfies GDO+GWO and no stronger model com- 
bine this execution with Figure 12. The complete lattice as shown in Figure 13 
is a powerful new way to describe and organize consistency models. Every non- 
synchronized model described in Section 2 is encompassed by the lattice model. 
In addition, five new consistency models are uncovered by the symmetry of the 
lattice. Every model in the lattice has a non-empty set of executions which satisfy 
that model and no stronger model in the lattice. Finally, new consistency proper- 
ties would be easy to integrate into the lattice if they are discovered. Synchronized 
models are not covered directly by the lattice. Instead, synchronized models can 
be viewed as processes submitting some operations under one consistency model, 
and some operations under another consistency model, i.e. a consistency transi- 
tion. Synchronized models will covered in Section 5 on consistency transitions. The 
lattice model facilitates the definition of consistency transitions because any two 
models are easily compared by the properties they enforce. 

5. CONSISTENCY TRANSITIONS 

Our final generalization of consistency models is the idea of consistency transitions. 
In synchronized consistency models, a program executes ordinary operations with 
a relaxed consistency model, usually slow consistency. Occasionally, the program 
executes synchronization operations with a stronger consistency model, usually se- 
quential consistency. These synchronization operations enforce additional ordering 
restrictions between ordinary operations. This can be viewed as a consistency tran- 
sition where the process executing a synchronization operation temporarily requests 
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Pi {r,Pi,y,2) {sw,pi,z,3) {w,pi,x,l) 

P2 {r,P2,X,l) (SM),P2,2,4) {w,p2,y, 2) 

Fig. 18. An Execution that violates weak consistency 

a stronger level of consistency. Our goal is to develop a general theory of consis- 
tency transitions between any two consistency models, not just slow and sequential. 
Synchronized models require the following. 

(1) All synchronization operations must be sequentially consistent. 

(2) All ordinary operations must be slow consistent. 

(3) The order <d must be respected between synchronization and ordinary oper- 
ations 

Sequential consistency is equivalent to GPO+GWO+GAO. So the first condition 
can be satisfied with serial views on synchronization operations. 

Viep3 SerialView(<iLocai U <po U <wo U <so U <ao«so) I 
(sr, i, *, *) U {sw, *, *, *)) 

Weak consistency does not include acquire and release operations. Instead, syn- 
chronization operations are special read and write operations. To distinguish them 
we use the operation types sr for synchronized read and sw for synchronized write. 
Remember that for other synchronized models the writes-to relation is defined with 
acquire operations treated as reads, and release operations treated as writes. If an 
acquire is defined as an sr and a release as an sw this definition is equally valid for 
every synchronized model. 

Slow consistency is equivalent to GPDO. So the second condition can be satisfied 
by serial views on ordinary operations. 

Viep3 ScrialVicw(<jLoca; U <pdo |(or, z, *, *) U (ow, *, *, *)) 

The operation type or is used for ordinary read, and ow for ordinary write. The 
views for synchronization and ordinary operations are very similar. They each have 
one view per processor, and each view contains the reads of that processor plus all 
writes. It would be nice to combine these views into a single view for each processor 
containing both synchronization and ordinary operations. The view would have to 
respect the ordering among synchronization operations, <synch, 

<synch = <PO U <W0 U <SO U <AO{<so) *' *) ^ {sW, *, *, *) 

and the ordering among ordinary operations, <ord, 
<ord=<PDO \{r,i,*,*) U {w, *,*,*) 

and it would have to respect <iLocai and <d- However, this straightforward 
approach has some problems. 

Figure 18 satisfies all of these properties and still does not satisfy weak consis- 
tency as the following views demonstrate. 
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pi : {sw,p2,z,4:) <i {w,p2,y,2) <i {r,pi,y,2) <i {sw,pi,z,3) <i 
{w,pi,x,l) 

P2 ■■ {■sw,pi,z,3) <2 {w,pi,x,l) <2 {r,P2,X,l) <2 {SW,P2,Z,4) <2 

(w,p2,y,2) 

Notice that the synchronized writes are unordered by < synch- They may occur 
in either order, but in this execution they are seen to occur in different orders by 
different processes. Docs this violate the assertion that synchronization operations 
must be sequentially consistent? After all, the synchronization operations by them- 
selves, ignoring ordinary operations, are sequentially consistent. The reason for this 
conundrum comes from a slight discrepancy between the intuitive definition and the 
formal definition of sequential consistency. The intuitive definition can be stated 
like this. 

There is a single total order of events, and all processes agree that the 
events happened in that order. 

However, the formal definition requires that there exist at least one order of 
events that every process can agree on. There may be more than one order of 
events that would satisfy every process, and there is no way to distinguish a single 
correct order from the sequentially consistent operations alone. This problem is 
not an artifact of our definition of GPO+GWO+GAO. It can still occur with the 
original definition of sequential consistency. Below is a restatement of the definition 
given previously for synchronized consistency models except that the positions of 
Vigp^a;6V and 3 <seq— ■ ■ ■ havc been reversed. 

Vigp,2;ev3 <s(.q=SerialVicw(<po |(,s, *. *, *)), and 
<s=the transitive closure of <d U <seq, and 
3 SerialView(<5 U <po \{*,i,x,*) U {w,*,x,*)) 

The syncronized operations are sequentially consistent, but each process gets to 
choose it's own definition of <seq- This causes the same problem. The original 
definition resolved this problem by requiring that the definition of <s for every 
process be based on a single definition of <seq- This same strategy can be used 
with GPO+GWO+GAO to generate the definition given below. Note that all 
synchronized reads must be included in every view. This will be addressed later. 

Theorem 5.1. The following definition is equivalent to synchronized model con- 
sistency 

yi(zp3 SerialView{<iLocai U <synch U <ord U <d \{or,i,*,*)U 

[ow, *, *, *)U(.sr, *, *, *)U(sw, *, *, *)) and all synchronization operations 

appear in the same order in every view. 

Proof: For an execution that satisfies the above views, construct the 
original definition synchronized consistent views as follows. The view 
<seq is the total order on synchronization operations that occurs in every 
view. Any two ordinary operations ordered by <s: must be ordered by 
a transitive chain containing only synchronization operations. The per- 
process views contain all synchronization operations and respect <d 
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Pi ^ {r,Pi,y,2) ^ ^ {sw,pi,z,3) ^ ^ {w,pi,x,l) ^ 

P2 {r,P2,x,l) {sw,p2,z,4) {w,p2,y,2) 



Fig. 19. Lineajrizability for Synchronization Operations 

and <seq so they must also respect <s- Construct the per-process per- 
variable slow consistent views required by weak consistency from the 
per-process GPDO consistent views as shown in Theorem 4.30. The 
new views will respect the old views so they will respect <s- 

For an execution that satisfies the original definition of synchronized 
consistency, construct the above views as follows. Begin with all the 
synchronization operations in the order specified by <seq- This is the 
order in which they will appear in every per-process view. The synchro- 
nization operations must respect <synch because by Lemma 4.24 every 
sequentially consistent view respects process order, write-read-write or- 
der, serial order, and anti order. Any single per-process, per-variable 
slow consistent view can always be combined with with the synchroniza- 
tion operations in a way that respects <n because the view respects <s 
which is the transitive closure of <seq and <£>■ Combine all slow consis- 
tent views with the synchronization operations in this way ignoring, for 
now, the order between operations from different slow consistent views. 
The resulting view will respect <synchi <ordi and <£>. All that remains 
is to show that it respects <iLocai- 

Between two synchronization operations, the ordinary operations can 
always be rearranged as a topological sort of <ord U <iLocai which is 
acyclic by Lemma 4.9. Two ordinary operations separated by synchro- 
nization operations cannot be out of order with respect to <iLocai be- 
cause 

Ol <D Si <seg 32 <D 02 <iLocal Ol 

This implies that S2 is process ordered before Si, but appears after it 
in <seg which is a contradiction. 

Should all processes be required to see the same total order of synchronization 
operations, or is it sufficient that the synchronization operations are sequentially 
consistent? We argue that sequentially consistency of synchronization operations 
should be sufficient even if this allows different processes see different total orders. 
First, we feel that the intuitive definition is in fact enforcing a consistency model 
stronger than sequential. For example, linearizability [Herlihy and Wing 1990] 
assumes the existence of a global Newtonian clock. The processes may not have 
access to this clock, but it does exist. Each operation spans a certain period of 
time. A linearizable execution must be sequential, and in addition if two operations 
have non-overlapping time spans they must appear in the sequential view in that 
order. Perhaps this problem would be solved if synchronization operations were 
linearizable, and ordinary operations had defined time spans and were forced to 
respect certain linearizable restrictions with synchronization operations. 
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For example, Figure 19 shows how linearizabihty could solve this problem for 
Figure 18. Even if the time spans for {siu,pi,z,3) and {sw,p2,z,4:) overlap, i.e. 
they can be seen in either order, there is no way that (r. 2) and {w,p2,y.2) 
can overlap while {r,p2,x, 1) and {w,pi,x, 1) also overlap. The definitions given for 
synchronized consistency models explicitly state that synchronization operations 
must be sequentially consistent. However, the implementations given with those 
definitions implicitly enforce linearizabihty over synchronization operations. The 
authors of the various models did not appreciate the effect of this slight distinction. 

Another reason not to require every process to sec the same total order is once 
again the argument over the distinction between memory model and programming 
model. The reader may have noticed that Figure 18 does not implement any kind 
of mutual exclusion or barrier behavior. The program docs not know in which order 
the synchronized writes occurred, but is relying on the fact that they occurred in 
the same order at all processes. If the program knows that two synchronization 
operations occurred in a particular order the problem disappears. If the operations 
are ordered by <synch then they must appear in that order in all views. In our 
opinion, if the programmer needs two operations to occur in the same order in all 
views then the control and data flow of the program must be able to detect in what 
order they occurred. This is part of the programming model, not the consistency 
model. In particular, this problem does not occur for data-race-free programs be- 
cause every pair of conflicting ordinary operations is separated by synchronization 
operations with control or data dependencies. I.e. the synchronization operations 
must be ordered by <synch- We propose to re-define <s for synchronized consis- 
tency models. Rather than being the transitive closure of <£> U <seq it should be 
the transitive closure of <d U <synch- Essentially, the synchronization operations 
must be sequentially consistent, and if the program can tell that two synchroniza- 
tion operations happened in a particular order then they must be placed in that 
order in all process' views. This leads to a revised definition of synchronized model 
consistency. 

Definition 5.2. For a given definition of <d, an execution is synchronized model 
consistent with the new definition <s iff 

3 <seq=ScrialVicw(<po *,*,*)), and 
<s=the transitive closure of <d U < synch i and 
Viep,xey3 SerialView(<s U <po \{*,i,x,*) U {w,*,x,*)) 

Theorem 5.3. The following definition is equivalent to synchronized model con- 
sistency with the new definition <s 

Vigp3 SerialView{<iL ocal u < synch U <ord U <D |(or,i, *,*)U 
{ow, *, *, *) U (sr, *, *. *) U (.s«;, *. *)) 

Proof: For an execution that satisfies the above views, construct the 
synchronized consistent views as follows. The order <seg is taken from 
any of the views as they all contain all synchronization operations. Any 
two ordinary operations ordered by <s must be ordered by a transi- 
tive chain containing only synchronization operations. The per-process 
views contain all synchronization operations and respect <d and <synch 
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so they must also respect <s. Construct the per-process per- variable 
slow consistent views required by weak consistency from the per-process 
GPDO consistent views as shown in Theorem 4.30. The new views will 
respect the old views so they will respect <s. 

For an execution that satisfies the new definition of synchronized con- 
sistency, construct the above views as follows. There must be at least 
one order of synchronization operations that respects synch 

because 

<seq exists. Furthermore, each per-process, per-variable slow consistent 
view respects <5 so it can always be combined with an ordering of syn- 
chronization operations that respects <synch and <£>. By similar logic 
as above, combine all operations for a single process into a single view, 
and the view will respect <iLocai- 

Now we will deal with the fact that every synchronized read must be placed in 
every view. The proof above relies on the fact that if oi <s 02 then oi and 02 must 
be placed in that order in every view in which they both occur. This is enforced by 
the fact that every view contains all synchronization operations and respects <d 
and <synch- If some view were not to contain some synchronized reads this might 
not hold. There are two cases in which ordinary operations can be ordered by <s- 
Case 1, Oi and 02 arc linked by a transitive chain containing at least one sw. In 
this case, the sw will be in every view so we can just link the ordinary operations 
to the synchronized write instead of any possible synchronized reads in the chain. 
Case 2, oi and 02 arc linked by a transitive chain containing only synchronized 
reads. In this case, we can link the ordinary operations to each other. This will 
be called transitive order, <t- In Definition 5.4, <'^ynch refers to traversing one or 
more edges of < synch- 

Definition 5.4. Transitive order, <t, is defined as 

if o <_D sr <tyn,,j^ sw then o <t sw 

if sw <jynch ■^^ <D o then sw <t o 
\{ o\ <D sr <D 02 then oi <t 02 

if Oi <D Sn <tynch ^''2 <D 02 then Oi <T 02 

Now we have another equivalent definition of synchronized model consistency 
where each per-process view contains only it's own reads whether ordinary or syn- 
chronized. 

Theorem 5.5. The following definition is equivalent to synchronized model con- 
sistency with the new definition <s 

Vigp3 SerialView{<iLocai U <synch U <ord U <o U <t |(or,i, *,*) U 
(ow, *, *, *) U (sr, i, *, *) U {sw, *, *, *)) 

Proof: By Lemma 4.26 it must still be possible to construct the view 
<seq- Also, the views must still respect <$ because any transitive chain 
in <£) and < synch must be reflected in the operations present in each 
view through <d, < synch, and <t. 

Now this definition can be generalized. The definition says that sequential con- 
sistency operations must be sequentially consistent with each other, slow consistent 
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Process pi Process p2 
The initial value of x is 0; 

y = f (input); 
X = 1; synch 

while(x==0) wait; synch 
read(y); 

Fig. 20. A Data-Race-Pree Program 

operations must be slow consistent with each other, and operations of different 
consistency levels must respect <d and <t between them. There is no reason this 
definition has to be limited to sequential and slow consistency, or limited to just 
two consistency levels. Each operation can be submitted under a different consis- 
tency model; any model within the lattice. This leads to a generalized definition 
of memory consistency. Each operation is considered to be labeled with a subset 
of the consistency properties, and two operations must respect an order such as 
process order if they are both labeled with the global process order property. 

Definition 5.6. Two operations are ordered by synchronization order oi < synch 
02 iff 

both are labeled GPO and Oi <po 02, or 

both are labeled GDO and oi <do 02, or 

both are labeled GWO and oi <wo 02, or 

both are labeled GAO and oi <so 02 or oi <ao{<so) ^2, or 

both are labeled GPDO and oi <pdo 02 - ■ ■ 

Definition 5.7. For a given definition of <d, an execution satisfies generalized 
memory consistency iff 

Vi£p3 SerialView(<iLoca; U <synch U <_d U <t \{r, i, *, *) U {w, *, *, *)) 

So a consistency model is defined by specifying < d and labeling operations with 
consistency properties. To simulate the non-synchronized models, <d is empty and 
all operations are labeled with the consistency properties of that model. <synch 
reduces to the union of the orders representing the labeled properties. For exam- 
ple, if all operations are labeled GPO+GWO, this definition reduces to the original 
definition of causal consistency. To simulate the synchronized consistency mod- 
els, use <D given for that model. Synchronization operations are labeled with 
GPO-I-GWO-I-GAO, and ordinary operations are labeled with GPDO. This defini- 
tion can also accommodate the variant of release consistency where synchronization 
operations respect processor consistency. To simulate location consistency all that 
is needed is to replace SerialView with SerialPartialView as given in Definition 3.11. 

These new definitions also allow new ideas about what it means for a program 
to be data-race-free [Adve and Hill 1993]. A data-race-free program is one that 
will only produce sequential executions even when the memory system supports 
a particular consistency model weaker than sequential consistency. For example, 
Figure 20 contains a program that will only produce sequential executions when it is 
run under weak consistency. This program is said to be weak-sequential data-race- 
free. A program may be data-race-free for some non-sequential consistency models. 
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and not data-race-free for others [Gharachorfoo et al. 1990]. The operations on x 
are synchronization operations. In order to exit the loop, P2 must read 1 from x. 
Therefore, the following ordering restrictions exist. 

{w,pi,y, /{input)) <D {w,pi,x,l) (r,p2,a;, 1) <d (r,p2,y,?) 

The view for p2 must contain all of these operations. If weak consistency is 
enforced, then <£i must be respected, and h- > must be respected because the view 
is serial. There are no other writes to y, so {r,p2,y,1) must return the value 
written by {w.pi.y. f (input)). If this value is returned then the execution is also 
sequentially consistent. One goal of synchronized consistency models is to simulate 
sequential consistency in this manner. This work provides a new, formal definition 
of what it means to be a data-race-free program. A program is data-race-free if 
and only if, for any execution produced by the program. 

Given the definition of <d and labeling of operations required for weak 

consistency: 

3 <so VaScrialView(<ii:,ocaZ U <synch U <d 1><t \{*,i,*,*)U 

{w, *, *)) 

implies 

3 <so VGSerialView(<po U <wo U <ao{<so) ^ <so !(*,«,*,*) U 

{w, *,*,*)) 

This literally says that if the program produces a weak consistent execution, 
then that same execution is also sequentially consistent. If the program is run in 
an environment that only produces weak consistent executions, then the program 
will only produce sequentially consistent executions. This definition of data-race- 
free is very general, but may not be too helpful to programmers. It does not give 
insight on how to write a program that satisfies the condition, and it may be hard 
to prove that a particular program satisfies the condition. For example, it does not 
even require that the same definition of serial order be used to produce the weak 
consistent views as the sequentially consistent views. One could provide simpler, 
conservative definitions that are easier to implement and prove, but still enforce 
the above condition. For example, if every pair of operations ordered by <po 
U <wo U <AO{<so) ^ <so were also ordered by <iLocai U <synch U <d U <r 
then the condition would hold. A further restriction along these lines is to say that 
every pair of ordinary operations to the same variable must be separated by control 
and data dependencies among synchronization operations which is the traditional 
definition of data-race-free. This new uniform notation may allow more precise, 
less conservative formulations of the class of data-race-free programs. 

6. CONCLUSIONS AND FUTURE WORK 

The thesis of this work is that every useful shared memory consistency model (well 
known and often used models in the literature) can be described by a single uni- 
fying framework. This work presents such a framework in the form of a lattice 
of primitive consistency properties, and a theory of transitions within the lattice. 
Shared memory can be viewed as an abstract API of interprocess communication 
parameterized by its consistency model. This API can be implemented in environ- 
ments with physically shared memory banks in hardware. Or in environments with 
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no physically shared memory, as in distributed shared memory systems. This style 
of interprocess communication is appropriate for many types of applications which 
can leverage research done on memory implementations and memory consistency 
models. 

The first contribution of this work is the discovery of four fundamental consis- 
tency properties. Global Process Order enforces the condition that all operations 
by a single process are seen everywhere in the system to occur in the order in which 
they were submitted. Global Data Order enforces the condition that for each vari- 
able, there exists at least one total order of operations which every process can 
agree could have been the actual order of those operations. Combining these two 
orders produces another consistency model, GPO-I-GDO, very similar to processor 
consistency. The difference arises in the fact that there; may be more than one possi- 
ble total order on each variable which satisfies data order. However, data order can 
be augmented to be a total order on operations to each variable. Processor consis- 
tency is equivalent to process order plus this augmented data order. This method of 
combining consistency properties is a general method which can be used to create 
a lattice of consistency models. Any two properties can be combined in this way 
to produce a consistency model stronger than either property alone. This work has 
also identified another combination operator which produces a new model weaker 
than either property alone. In this case, GPDO produces slow consistency. Thus, 
all possible combinations of consistency properties produce a lattice of models. 

The third property. Global Write-read-write Order enforces aspects of causality. 
It is defined such that GPO+GWO is equivalent to causal consistency. It is the 
weakest property (the smallest set of edges) for which this is true. The fourth 
property is Global Anti Order. Anti order is defined such that all four properties 
combined produce a model equivalent to sequential consistency. To accomplish this. 
Global Anti Order requires two ordering relations among operations, anti order and 
serial order. Serial order captures the restriction that every read must read from 
the most recent write. Anti order is based on both serial order and data order. This 
complexity is required as any weaker definition of anti order was not sufficient to 
enforce equivalence to sequential consistency. Another side effect of this complexity 
is that Global Anti Order is not orthogonal to all three other properties. It is strictly 
stronger than data order. 

The second contribution of this work is the concept of a consistency lattice. As 
stated before, enumerating every combination of the four consistency properties 
with both combination operators produces a lattice of consistency models. The 
strongest model in the lattice is sequential consistency, and the weakest is local 
consistency. Every non-synchronized consistency model described in Subsection 2.2 
is equivalent to a node in this lattice. The lattice model validates the derived con- 
sistency properties as necessary and sufficient to describe all such models. Further- 
more, for every consistency model in the lattice there exists a non-empty set of 
executions accepted by that model and no stronger model in the lattice. 

The third contribution of this work is that the lattice includes five previously 
unnamed, non-empty consistency models: GWO, GAO, GDO-hGWO, GPO-hGAO, 
GWO-I-GAO. We believe the most promising of these is GDO-I-GWO. It is a data- 
centric version of causality where operations are placed in causal order when they 
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are applied to their variable, not when they are issued by their process. 

The fourth contribution of this work is a transition theory over the consistency 
lattice. The uniform lattice framework assists in the development of the transition 
theory because any two models can be compared by their properties, and transi- 
tions can be viewed as adding or removing properties. The transition theory was 
evaluated against synchronized consistency models, and every synchronized model 
described in Sec 2.3 can be modeled by this transition theory. This led to the devel- 
opment of a single statement of consistency called generalized consistency. Under 
generalized consistency, every operation is labeled with a set of consistency prop- 
erties. Consistency requirements among operations depend on their labelings. If 
every operation is labeled with the same set of properties, generalized consistency 
simulates the non-synchronized consistency model represented by the combination 
of those properties. Various other labelings simulate the transitions equivalent to 
the synchronized models. 

In the future, this work can be extended in several directions. In the lattice, the 
five new consistency models need to be examined to determine intuitive definitions 
of the effects enforced by those models, and whether existing applications may be 
able to take better advantage of the new models. The space of consistency models 
around processor consistency needs to be explored in more detail as well as other 
methods of combining properties such as GPOflGDO and GPAO. Finally, efficient 
implementations could be examined with regards to what consistency properties 
they enforce. A lattice of implementations related to the lattice of consistency 
models would be helpful in automating selection of memory implementations. 
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APPENDIX 

Definition A.l. An execution is a set of processes, P, a set of shared variables, 
V, a set of operations, O, and two partial orders on O, process order, <po, and 
writes-to order, i— >. 

Definition A. 2. An operation is a tuple {op, i, x, v) where op is r for a read, w for 
a write, or o if the type of operation is unknown, i G P is the process submitting 
the operation, a; e V is the variable to which the operation is applied, and u is a 
valid value for the variable x. 

Definition A. 3. An operation pattern is written like an operation with * in place 
of one or more of the attributes. It represents the set of all operations in O that 
match the pattern in all attributes that are not *. 

For example, (r, pi , a;, 5) denotes that process p\ read the variable x, and received 
the value 5. {w, *, *, *) denotes the set of all write operations. 

Definition A. 4. The set of operations, O, 

O = (Ujgp the operations submitted by i) y}{\Jxev{w, e, x, _L)) 

where e is a special symbol not used to denote any process, and _L is a special 
value that cannot be written by any process. The operation {w, e, x, _L) is called 
the initial write of x. 

Definition A. 5. Local order for process i, <iLocai, 

<iLocai={a total order on (*, i, *, *)) [j 

(^xev.o,e{*.i.*,*) iw,e,x,±) <iLocai Oi) 
Definition A. 6. Process order, <pOi 

<PO= Uiep <iLocal 

Definition A. 7. Writes-to order, i—^, 

^(r,i,a;,j))eO=l Unique {w,j,x,v) G O such that {w,j,x,v) i-^ {r,i,x,v) 

These definitions say that the set O includes the operations submitted by all 
processes plus an initial write for each variable. Operations by a single process 
are totally ordered and are ordered after all initial writes by local order. Process 
order is the union of all local orders. Without loss of generality, assume that every 
variable has an initial write, and writes are uniquely valued. As a consequence of 
this, for every read there exists exactly one write that writes-to that read. Writes-to 
order is redundant with the values returned by read operations. Knowing either 
one determines the other, but both are defined for convenience. 
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P = {pi,P2} 
V = {x,y} 

O = {{w, e, X, A-), {w, e,y, A-) 



P = {Pl} 

V = {x} 

O = {{w,€,x,±), {w,pi,x,l}, 



{w,pi,x,l), (r,pi,j/,2), 



{w,pi,x,2),{r,pi,x,l)} 



{r,p2,x,l), {w,p2,y,2)} 
'w,e,x,±) <po (w,pi,x,l) <po (r,pi,y,2) 
'w,e,y,±) <po iw,pi,x,l) <po {r,pi,y,2) 
'w,e,x,±) <po {r,p2,x,l) <po {w,p2,y,2) 
'w,e,y,±) <po ir,p2,x, 1) <po {w,p2,y,2) 
'w,pi,x,l) (r,p2,x,l) 
'w,p2,y,2) 1-+ (r,pi,y,2) 



{w,e,x,L) <po iw,pi,x,l) <po 
{w,pi,x,2) <po {r,pi,x,l) 



{w,pi,x, 1) ^ (r,pi,x, 



1) 



(a) 



(b) 



Fig. 21. Two Executions 



An execution defines the operations that were submitted to a memory system 
and specifies the externally visible behavior of the memory system by the writes-to 
relation. Now we need to relate the behavior of the memory system to correctness 
with respect to a consistency model. Consider Figure 21. Execution (a) corresponds 
to a sequentially consistent execution. Prom the set of operations, O, and the 
process order we see that pi wrote x and then read y, and p2 read x and then wrote 
y. From the writes-to order we see that p2 read pi's write, and pi read p2's write. 
This corresponds to a sequential order of: 

{w,€,x,±) < {w,e,y,±) < {w,pi,x,l) < {r,p2,x,l) < {w,p2,y,2) < 
{r,Pi,y,2) 

where < denotes an unnamed total order. Execution (b), however, is a little 
disconcerting. There is one process, pi wrote 1 to x, then wrote 2 to x, and then 
read x. Unfortunately, the read returned the value 1 from the first write, and not 2 
from the second. When we try to create a total order we run into a contradiction. 
If the order is: 

{w,e,x,±) < {w,pi,x,l) < {w,pi,x,2) < {r,pi,x,l) 

then the read does not read from the most recent write, but if the order is: 

{w,e,x,l.) < {w,pi,x,l) < {r,pi,x,l) < {w,pi,x,2) 

then this violates process order. The important thing to note is that this does 

qualify as an execution. Imagine a computer with out of order instruction dis- 
patching. If this dispatching mechanism were buggy it might accidentally switch 
the order of a read and write to the same variable. Execution (b) exactly mod- 
els this sort of phenomenon. However, it is not likely that this execution will be 
deemed correct by any consistency model. The problems we just saw with creating 
a total order also give us a hint about how to define a consistency model in terms 
of allowable executions. 

Definition A. 8. A view is a total order on a set of operations representing one 
process' view of the sequence of events within the memory system. 

Definition A. 9. A view is serial iff every read returns the value from the most 
recent (defined by the order of the view) write to the same variable. 
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Definition A. 10. A view is said to respect a relation if every edge in the relation 
appears in the view. 

Definition A.ll. A relation, <, can be restricted to a subset of operations, de- 
noted < \subset, which results in a relation containing the set of edges that are 
both in < and between two operations in subset. 

The notation, SerialView(< \subset), denotes a serial view over the operations in 
subset respecting the relation < \subset. Usually, subset will be defined in terms 
of operation patterns, or if subset is the entire set O the shorthand SerialView(<) 
will be used. 
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